[
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248272#comment-17248272
]
Michael Stack commented on HBASE-24749:
---------------------------------------
bq. Then we will followup tasks to merge it into hbase:meta via a single writer
and show the Y write throughput that's not far from the the system table
approach.
Sounds good (There are a few recent paragraphs here that explain why I'm
concerned when I see mention of a new System Table -- See '2.0.1 Avoid
compounding of Region Assignment Complexity' in
https://docs.google.com/document/d/11ChsSb2LGrSzrSJz8pDCAw5IewmaMV0ZDN1LrMkAj4s/edit#)
bq. I'm wondered what the testing scope of hbase-on-s3 could be? are we testing
the functionality of using S3A/DFS API to perform write operation?
Could start small. Configure minihbasecluster so its on s3 then run a subset of
tests that grows over time proving that hbase works on s3 across the variety of
failures the test suite is full of (HBase has its own set of machines attached
to Apache infrastructure donated by Xiaomi. These machines are EC2 instances if
that helps).
Good stuff.
> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
> Key: HBASE-24749
> URL: https://issues.apache.org/jira/browse/HBASE-24749
> Project: HBase
> Issue Type: Umbrella
> Components: Compaction, HFile
> Affects Versions: 3.0.0-alpha-1
> Reporter: Tak-Lon (Stephen) Wu
> Assignee: Tak-Lon (Stephen) Wu
> Priority: Major
> Labels: design, discussion, objectstore, storeFile, storeengine
> Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}}
> directory used in the commit stage for common HFile operations such as flush
> and compaction to improve the write throughput and latency on object stores.
> Specifically for S3 filesystems, this will also mitigate read-after-write
> inconsistencies caused by immediate HFiles validation after moving the
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN,
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed
> improvement on the object stores use case makes senses and if we miss
> anything should be included.
> Improvement Highlights
> 1. Lower write latency, especially the p99+
> 2. Higher write throughput on flush and compaction
> 3. Lower MTTR on region (re)open or assignment
> 4. Remove consistent check dependencies (e.g. DynamoDB) supported by file
> system implementation
--
This message was sent by Atlassian Jira
(v8.3.4#803005)