[jira] [Comment Edited] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Zach York (Jira) Thu, 23 Jul 2020 20:38:25 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164094#comment-17164094
 ]


Zach York edited comment on HBASE-24749 at 7/24/20, 3:37 AM:
-------------------------------------------------------------

Yes, I think that is potentially an alternative implementation that could work. 
One downside I could see is you would still want to be able to handle bulk 
loading/other procedures. If all updates to the state are controlled by the RS, 
this approach would work. I wonder what the perf difference might be... since 
in this case you would have to replay edits always.

Edit: After thinking it through a bit, the WAL approach has one problem in our 
environment (where we expect the HDFS WALs will not be migrated to a new 
cluster). Storing the data in a table is more durable for our use case, but the 
WAL implementation could be suitable for the ROOT table where it matters less 
if the file list needs to fall back to FS listing/validation. 


was (Author: zyork):
Yes, I think that is potentially an alternative implementation that could work. 
One downside I could see is you would still want to be able to handle bulk 
loading/other procedures. If all updates to the state are controlled by the RS, 
this approach would work. I wonder what the perf difference might be... since 
in this case you would have to replay edits always.

> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
>                 Key: HBASE-24749
>                 URL: https://issues.apache.org/jira/browse/HBASE-24749
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Compaction, HFile
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: design, discussion, objectstore, storeFile, storeengine
>         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Reply via email to