[ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164090#comment-17164090
 ] 

Anoop Sam John edited comment on HBASE-24749 at 7/24/20, 2:33 AM:
------------------------------------------------------------------

bq. Can you expand on how we can get in a situation where a partial file is 
written? I'm trying to see if there are any failure modes we haven't though of. 
If the case is a complete file written to the data directory, is there harm in 
picking up the new file (even if it hasn't successfully committed to the SFM)?
That point was based on another direction what Stack was saying.  Am not sure 
whether Stack suggested that for META table alone or for all.  ie use the WAL 
event markers to know whether a HFile is committed or not. If we see, during 
WAL replay that there is a flush begin marker and later flush complete marker, 
this means it is  a committed file.  If no markers at all for a file, this is 
an old existing file. If only begin but no end means this is not a committed 
file and so while region reopen, we can ignore this.  Same with compaction 
also.   There one issue was what if the WAL file which is having the begin 
marker got rolled and deleted.  We lost the track.  But if that can be 
controlled, this is also a direction no? (Dedicated wal for these event 
markers)  We can avoid the need to store all the files list into META and avoid 
the Q of how to handled the META's file list. Storing in zk is not a direction.


was (Author: anoop.hbase):
bq. Can you expand on how we can get in a situation where a partial file is 
written? I'm trying to see if there are any failure modes we haven't though of. 
If the case is a complete file written to the data directory, is there harm in 
picking up the new file (even if it hasn't successfully committed to the SFM)?
That point was based on another direction what Stack was saying.  Am not sure 
whether Stack suggested that for META table alone or for all.  ie use the WAL 
event markers to know whether a HFile is committer or not. If we see, during 
WAL replay that there is a flush begin marker and later flush complete marker, 
this means it is  a committed file.  If no markers at all for a file, this is 
an old existing file. If only begin but no end means this is not a committed 
file and so while region reopen, we can ignore this.  Same with compaction 
also.   There one issue was what if the WAL file which is having the begin 
marker got rolled and deleted.  We lost the track.  But if that can be 
controlled, this is also a direction no?  We can avoid the need to store all 
the files list into META and avoid the Q of how to handled the META's file 
list. Storing in zk is not a direction.

> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
>                 Key: HBASE-24749
>                 URL: https://issues.apache.org/jira/browse/HBASE-24749
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Compaction, HFile
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: design, discussion, objectstore, storeFile, storeengine
>         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to