[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Tak-Lon (Stephen) Wu (Jira) Wed, 22 Jul 2020 16:30:34 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163136#comment-17163136
 ]


Tak-Lon (Stephen) Wu commented on HBASE-24749:
----------------------------------------------

Thanks Stack, HBASE-14090 and their [design 
doc|https://docs.google.com/document/d/10tSCSSWPwdFqOLLYtY2aVFe6iCIrsBk4Vqm8LSGUfhQ/edit#]
 seems share several directions, e.g. how to use a hbase:meta column to track 
storefile and create a cleaner efficiently remove left over Storefiles. So, we 
will take another review with those design docs to see what we can pick within 
this proposed scope. Also as [~elserj] pointed out from the dev@ list, Accumulo 
is also managing their data/RFile with a metadata table. 

Without the support from ZK, write an extra edit to the WAL as a `commit` 
marker and reuse it for recovering the ROOT region (for hbase:meta and maybe 
the MasterRegion) make senses, and if the flush failed normally, we don't write 
a new marker and remove that storefile if written. and if any HFile is being 
written successfully without a even marker, we probably need a repair hook 
(maybe HBCK) to consider including the written storefile back to be tracked.

> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
>                 Key: HBASE-24749
>                 URL: https://issues.apache.org/jira/browse/HBASE-24749
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Compaction, HFile
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: design, discussion, objectstore, storeFile, storeengine
>         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Reply via email to