[ https://issues.apache.org/jira/browse/HBASE-23730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024186#comment-17024186 ]
ramkrishna.s.vasudevan commented on HBASE-23730: ------------------------------------------------ bq.This route would work for s3 and hdfs I think if the WAL marker is used effectively in replays (and during read replicas - I think it is more important there) . Then even in HDFS also we can avoid the rename type of algo. Good point. > Optimize Memstore Flush for Hbase on S3(Object Store) > ----------------------------------------------------- > > Key: HBASE-23730 > URL: https://issues.apache.org/jira/browse/HBASE-23730 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: Jarred Li > Priority: Major > > The current Memstore Flush Process is divided into 2 stages: > # Flushcache: In this stage, a “.tmp” region file is written in S3/HDFS for > the memstore; > # Commit: In this stage, the “.tmp” file created in the stage 1 is renamed > to final destination of HBase region file. > The above design(flush and commit) is OK for HDFS because “rename” is light > opertion(only metadata operation). However, for storage like S3 or other > object store, rename is “copy” and “delete” operation. > We can follow the same pattern from V2 of “FileOutputCommitter” in > MapReduce. That means, we can write hfile directly to the S3 destination > directory without “copy” and “paste”. So that we can have less S3 operations > and the HBase memstore flush is more efficient. -- This message was sent by Atlassian Jira (v8.3.4#803005)