Tak-Lon (Stephen) Wu created HBASE-24749:
--------------------------------------------

             Summary: Direct insert HFiles and Persist in-memory HFile tracking
                 Key: HBASE-24749
                 URL: https://issues.apache.org/jira/browse/HBASE-24749
             Project: HBase
          Issue Type: Umbrella
          Components: Compaction, HFile, Zookeeper
    Affects Versions: 3.0.0-alpha-1
            Reporter: Tak-Lon (Stephen) Wu
         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
insert HFiles and Persist in-memory HFile tracking.pdf

We propose removing the {{.tmp}} directory used in the commit stage for common 
HFile operations such as flush and compaction to improve the write throughput 
and latency on object stores. Specifically for S3 filesystems, this will also 
mitigate read-after-write inconsistencies caused by immediate HFiles validation 
after moving the HFile(s) to data directory.

Please see attached for this proposal and the initial result captured with 25m 
(25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, and 
workload C RUN result.

the goal of this JIRA is to discuss with the community if the proposed 
improvement on the object stores use case makes senses and if we miss anything 
should be included. 

Improvement Highlights
 1. Lower write latency, especially the p99+
 2. Higher write throughput on flush and compaction 
 3. Lower MTTR on region (re)open or assignment 
 4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
system imple



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to