[ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164037#comment-17164037
 ] 

Tak-Lon (Stephen) Wu edited comment on HBASE-24749 at 7/24/20, 12:03 AM:
-------------------------------------------------------------------------

bq. If an HFile is written successfully but no marker in the WAL, then it 
doesn't exist, right? As part of the WAL replay you will reconstitute it from 
edits in the WAL?
You're right if that happens, any uncommitted HFile should not be taken and 
replay from the WALs. (assuming that replay may generate the same content but a 
different HFile, I was thinking an store open optimization that but that would 
be too far from now.)

bq. you have surveyed the calls to the NN made by HBase on a regular basis?
we don't have how many rename calls captured to NN or even to object stores, 
and we will add a measurement survey task on the related milestone. But at one 
point we captured while running compaction with rename, the overall wall clock 
time only for the rename part were dominating ~60% of overall compaction time 
on object stores.

bq. IIRC there is a issue for storing the compacted files in HFile's metadata, 
to solve the problem that the wal file contains the compaction marker may be 
deleted before wal splitting.
it should be HBASE-20724, so for compaction we can reused that to confirm if 
the flushed StoreFile were from a compaction. 

bq. Now while replay of wal, we dont have start compaction marker for this wal 
file. So we think this is an old valid file but that is wrong. This is a 
partial file. This is possible.
don't we only have the `end` compaction event marker only when compacted 
HFile(s) has been moved to cf directory right before updating the store file 
manager? but yeah if the WAL is rolled, then we lost this even marker. 

this is a good discussion, I will put down the above discussion as 
consideration to related milestone.



was (Author: taklwu):
bq. If an HFile is written successfully but no marker in the WAL, then it 
doesn't exist, right? As part of the WAL replay you will reconstitute it from 
edits in the WAL?
You're right if that happens, any uncommitted HFile should not be taken and 
replay from the WALs. (assuming that replay may generate the same content but a 
different HFile, I was thinking an store open optimization that but that would 
be too far from now.)

bq. you have surveyed the calls to the NN made by HBase on a regular basis?
we don't have how many rename calls captured to NN or even to object stores, 
and we will add a measurement survey task on the related milestone. But at one 
point we captured while running compaction with rename, the overall wall clock 
time only for the rename part were dominating ~60% of overall compaction time 
on object stores.

bq. IIRC there is a issue for storing the compacted files in HFile's metadata, 
to solve the problem that the wal file contains the compaction marker may be 
deleted before wal splitting.
it should be HBASE-20724, so for compaction we can reused that to confirm if 
the flushed StoreFile were from a compaction. 

bq. Now while replay of wal, we dont have start compaction marker for this wal 
file. So we think this is an old valid file but that is wrong. This is a 
partial file. This is possible.
don't we only have the `end` compaction event marker only when compacted 
HFile(s) has been moved to cf directory right before updating the store file 
manager? but yeah if the WAL is rolled, then we lost this even marker. 

but this is a good discussion, I will put down the above discussion as 
consideration to related milestone.


> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
>                 Key: HBASE-24749
>                 URL: https://issues.apache.org/jira/browse/HBASE-24749
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Compaction, HFile
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: design, discussion, objectstore, storeFile, storeengine
>         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to