[
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164147#comment-17164147
]
Michael Stack commented on HBASE-24749:
---------------------------------------
Just a thought; keeping the hfile set in hbase:meta is going to up the
read/write load on this table significantly; every flush and compaction will
result in an update inline w/ the flush/compaction completion – if it fails,
the flush/compaction fail? – and every open will be an hbase:meta read to find
set of files to use. Currently Master only writes hbase:meta so RS will tell
Master the compaction result – fine-by-me because master should be running
compactions anyways – or the new flush file added, and Master would update, or
RS writes meta, a violation of a simplification we made trying to ensure
one-writer. Just a note.
> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
> Key: HBASE-24749
> URL: https://issues.apache.org/jira/browse/HBASE-24749
> Project: HBase
> Issue Type: Umbrella
> Components: Compaction, HFile
> Affects Versions: 3.0.0-alpha-1
> Reporter: Tak-Lon (Stephen) Wu
> Assignee: Tak-Lon (Stephen) Wu
> Priority: Major
> Labels: design, discussion, objectstore, storeFile, storeengine
> Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}}
> directory used in the commit stage for common HFile operations such as flush
> and compaction to improve the write throughput and latency on object stores.
> Specifically for S3 filesystems, this will also mitigate read-after-write
> inconsistencies caused by immediate HFiles validation after moving the
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN,
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed
> improvement on the object stores use case makes senses and if we miss
> anything should be included.
> Improvement Highlights
> 1. Lower write latency, especially the p99+
> 2. Higher write throughput on flush and compaction
> 3. Lower MTTR on region (re)open or assignment
> 4. Remove consistent check dependencies (e.g. DynamoDB) supported by file
> system implementation
--
This message was sent by Atlassian Jira
(v8.3.4#803005)