Jarred Li created HBASE-23730:
---------------------------------

             Summary: Optimize Memstore Flush for Hbase on S3(Object Store)
                 Key: HBASE-23730
                 URL: https://issues.apache.org/jira/browse/HBASE-23730
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
            Reporter: Jarred Li


The current Memstore Flush Process is divided into 2 stages:
 # Flushcache: In this stage, a “.tmp” region file is written in S3/HDFS for 
the memstore;
 # Commit: In this stage, the “.tmp” file created in the stage 1 is renamed to 
final destination of HBase region file.

The above design(flush and commit) is OK for HDFS because “rename” is light 
opertion(only metadata operation). However, for storage like S3 or other object 
store, rename is “copy” and “delete” operation.

We can follow the same pattern from V2 of  “FileOutputCommitter” in MapReduce. 
That means, we can write hfile directly to the S3 destination directory without 
“copy” and “paste”. So that we can have less S3 operations and the HBase 
memstore flush is more efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to