[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Anoop Sam John (Jira) Thu, 23 Jul 2020 06:01:34 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163542#comment-17163542
 ]


Anoop Sam John commented on HBASE-24749:
----------------------------------------

bq.On events such as flush and compaction, we write markers to the WAL w/ notes 
listing files that participated in the event. On recovery, we read these events 
completing compactions if all participants present and it looked like we 
crashed after compaction completed but before we got to slot the new files into 
place and remove the old.
This is not only for META (or ROOT) table you suggesting stack? (But 
generically)  Ya during flush and compaction we have start markers with files 
involved in that op and on completion another marker.  During region recovery, 
we can use these markers to identify uncommitted files right? Need to see 
whether there is chance that we wall gets rolled and miss the start markers. 

> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
>                 Key: HBASE-24749
>                 URL: https://issues.apache.org/jira/browse/HBASE-24749
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Compaction, HFile
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: design, discussion, objectstore, storeFile, storeengine
>         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Reply via email to