[ 
https://issues.apache.org/jira/browse/HBASE-23730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023349#comment-17023349
 ] 

Jarred Li edited comment on HBASE-23730 at 1/25/20 12:39 AM:
-------------------------------------------------------------

Hi Michael, thank you very much for your comments.

I think we can use a flag file to indicate the complection of flush. We 
generate two files during "Flushcache" stage, let's see "76ce2875960" and 
"76ce2875960.inprogress". The "76ce2875960.inprogress"  file is the flag file 
to indicate the flush is inprogress. During "Commit" stage, we don't call 
"rename" of the region file(which is copy and delete for object store), we call 
"delete" of flag to complete the whole process of flush. Since the flag file is 
just empty file and there is no data operation, the performance is better. 
Meanwhile, object store "rename" is not atomic operation, it is error prone. 
"delete" a emplty file is relative easy.

 

The above solution is only for object store such as S3. For HDFS, I think we 
shall keep it as is as rename of HDFS is atomic operation.

 


was (Author: leejianwei):
Hi Miachael, thank you very much for your comments.

I think we can use a flag file to indicate the complection of flush. We 
generate two files during "Flushcache" stage, let's see "76ce2875960" and 
"76ce2875960.inprogress". The "76ce2875960.inprogress"  file is the flag file 
to indicate the flush is inprogress. During "Commit" stage, we don't call 
"rename" of the region file(which is copy and delete for object store), we call 
"delete" of flag to complete the whole process of flush. Since the flag file is 
just empty file and there is no data operation, the performance is better. 
Meanwhile, object store "rename" is not atomic operation, it is error prone. 
"delete" a emplty file is relative easy.

 

The above solution is only for object store such as S3. For HDFS, I think we 
shall keep it as is as rename of HDFS is atomic operation.

 

> Optimize Memstore Flush for Hbase on S3(Object Store)
> -----------------------------------------------------
>
>                 Key: HBASE-23730
>                 URL: https://issues.apache.org/jira/browse/HBASE-23730
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Jarred Li
>            Priority: Major
>
> The current Memstore Flush Process is divided into 2 stages:
>  # Flushcache: In this stage, a “.tmp” region file is written in S3/HDFS for 
> the memstore;
>  # Commit: In this stage, the “.tmp” file created in the stage 1 is renamed 
> to final destination of HBase region file.
> The above design(flush and commit) is OK for HDFS because “rename” is light 
> opertion(only metadata operation). However, for storage like S3 or other 
> object store, rename is “copy” and “delete” operation.
> We can follow the same pattern from V2 of  “FileOutputCommitter” in 
> MapReduce. That means, we can write hfile directly to the S3 destination 
> directory without “copy” and “paste”. So that we can have less S3 operations 
> and the HBase memstore flush is more efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to