[
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163163#comment-17163163
]
Michael Stack commented on HBASE-24749:
---------------------------------------
{quote}and if any HFile is being written successfully without a even marker, we
probably need a repair hook (maybe HBCK) to consider including the written
storefile back to be tracked.
{quote}
If an HFile is written successfully but no marker in the WAL, then it doesn't
exist, right? As part of the WAL replay you will reconstitute it from edits in
the WAL?
On HBASE-14090, it is old but still cool, virtuous, aiming to hit a bigger
target.
A question for your that I think might be of general utility is whether you
have surveyed the calls to the NN made by HBase on a regular basis? It would be
good to get a list of renames – files and dirs – for your project but also, my
sense is that we are profligate w/ our NN calls. If a survey and report, we
might be able up performance at least around MTTR. Anyways, just a thought.
> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
> Key: HBASE-24749
> URL: https://issues.apache.org/jira/browse/HBASE-24749
> Project: HBase
> Issue Type: Umbrella
> Components: Compaction, HFile
> Affects Versions: 3.0.0-alpha-1
> Reporter: Tak-Lon (Stephen) Wu
> Assignee: Tak-Lon (Stephen) Wu
> Priority: Major
> Labels: design, discussion, objectstore, storeFile, storeengine
> Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}}
> directory used in the commit stage for common HFile operations such as flush
> and compaction to improve the write throughput and latency on object stores.
> Specifically for S3 filesystems, this will also mitigate read-after-write
> inconsistencies caused by immediate HFiles validation after moving the
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN,
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed
> improvement on the object stores use case makes senses and if we miss
> anything should be included.
> Improvement Highlights
> 1. Lower write latency, especially the p99+
> 2. Higher write throughput on flush and compaction
> 3. Lower MTTR on region (re)open or assignment
> 4. Remove consistent check dependencies (e.g. DynamoDB) supported by file
> system implementation
--
This message was sent by Atlassian Jira
(v8.3.4#803005)