[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Michael Stack (Jira) Fri, 11 Dec 2020 17:18:07 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248272#comment-17248272
 ]


Michael Stack commented on HBASE-24749:
---------------------------------------

bq. Then we will followup tasks to merge it into hbase:meta via a single writer 
and show the Y write throughput that's not far from the the system table 
approach. 

Sounds good (There are a few recent paragraphs here that explain why I'm 
concerned when I see mention of a new System Table -- See '2.0.1 Avoid 
compounding of Region Assignment Complexity' in 
https://docs.google.com/document/d/11ChsSb2LGrSzrSJz8pDCAw5IewmaMV0ZDN1LrMkAj4s/edit#)

bq. I'm wondered what the testing scope of hbase-on-s3 could be? are we testing 
the functionality of using S3A/DFS API to perform write operation?

Could start small. Configure minihbasecluster so its on s3 then run a subset of 
tests that grows over time proving that hbase works on s3 across the variety of 
failures the test suite is full of (HBase has its own set of machines attached 
to Apache infrastructure donated by Xiaomi. These machines are EC2 instances if 
that helps).

Good stuff.



> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
>                 Key: HBASE-24749
>                 URL: https://issues.apache.org/jira/browse/HBASE-24749
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Compaction, HFile
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: design, discussion, objectstore, storeFile, storeengine
>         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Reply via email to