[ 
https://issues.apache.org/jira/browse/HBASE-23730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023377#comment-17023377
 ] 

Michael Stack commented on HBASE-23730:
---------------------------------------

See WALUtil#writeFlushMarker. IIRC, we write a marker into the WAL when we 
start flush with detail like flush file name. I believe we do the same writing 
a close marker when done. On recover of the Region should it crash during 
flush, we'll replay the WAL. If a 'start' marker w/o a 'close' marker, then we 
can safely delete the file as not complete. See how compaction does similar. 
This way you wouldn't have to maintain a side file that may or may not be there 
when you most need it on the eventually consistent S3. This route would work 
for s3 and hdfs (would be an improvement over current mechanism doing away w/ 
namenode interaction doing rename).

> Optimize Memstore Flush for Hbase on S3(Object Store)
> -----------------------------------------------------
>
>                 Key: HBASE-23730
>                 URL: https://issues.apache.org/jira/browse/HBASE-23730
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Jarred Li
>            Priority: Major
>
> The current Memstore Flush Process is divided into 2 stages:
>  # Flushcache: In this stage, a “.tmp” region file is written in S3/HDFS for 
> the memstore;
>  # Commit: In this stage, the “.tmp” file created in the stage 1 is renamed 
> to final destination of HBase region file.
> The above design(flush and commit) is OK for HDFS because “rename” is light 
> opertion(only metadata operation). However, for storage like S3 or other 
> object store, rename is “copy” and “delete” operation.
> We can follow the same pattern from V2 of  “FileOutputCommitter” in 
> MapReduce. That means, we can write hfile directly to the S3 destination 
> directory without “copy” and “paste”. So that we can have less S3 operations 
> and the HBase memstore flush is more efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to