[ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183739#comment-17183739
 ] 

Tak-Lon (Stephen) Wu commented on HBASE-24749:
----------------------------------------------

I updated a [design 
doc|https://docs.google.com/document/d/15Nx-xZ7FoPoud9vqkmIwphkNwBv0mdKMkvU7Ley5i4A/edit?usp=sharing]
 (google doc version), then we leave any design related comments on there 
directly to avoid a long page of comments in this JIRA.

In addition, I'm wondered if we can simplify the circuit for tracking the 
HFiles of the ROOT region to directly rely on the file storage (assuming the 
WAL works fine + HFiles are always immutable) without adding a tracking layer 
as well as directly writes HFiles to the data directory. e.g. the dependencies 
flow should be
 # When (re)open, ROOT region only cares HFiles in the data directory of ROOT 
Region (relies on the MVCC protection of what files should be included). 
 # HFile Tracking of hbase:meta are written to ROOT region (similar to how the 
meta location is being handled), and this tracking metadata is being protected 
by the WAL of ROOT Region and HFiles in the data directory of ROOT Region.
 # HFile Tracking of any other tables are being updated to a column family 
cf:storefile in hbase:meta. The only read extensively period is during the 
region open and region assignment.

We have provided an [investigation (Appendix#1) within the new design doc 
|https://docs.google.com/document/d/15Nx-xZ7FoPoud9vqkmIwphkNwBv0mdKMkvU7Ley5i4A/edit?usp=sharing]
 that MVCC (Max Seq#) in the Store is the guard to reload cells from HFiles in 
the data directory without the tracking metadata and .tmp directory. But we 
should only use it for ROOT region because the amount of HFiles in ROOT 
directory is limited and normally won't change frequently

 

> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
>                 Key: HBASE-24749
>                 URL: https://issues.apache.org/jira/browse/HBASE-24749
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Compaction, HFile
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: design, discussion, objectstore, storeFile, storeengine
>         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to