[ 
https://issues.apache.org/jira/browse/HBASE-18358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082802#comment-16082802
 ] 

Gary Helmling commented on HBASE-18358:
---------------------------------------

Thanks for putting up the patch, Ted.

Looking over the approach, I'm not sure this is sufficient, though.

HRegion::flushcache() could return CANNOT_FLUSH when only a single memstore is 
being flushed from a multi column family table.  In this case, we will wait for 
the single memstore to complete, but I think the other memstores could remain 
unflushed and would not be part of the snapshot.

I think we have a couple options:
# in FlushSnapshotSubprocedure, we could still call HRegion::waitForFlushes(), 
then retry calling HRegion::flush(true) a number of time until we get a result 
of FLUSHED_NO_COMPACTION_NEEDED | FLUSHED_COMPACTION_NEEDED |  
CANNOT_FLUSH_MEMSTORE_EMPTY.
# in FlushSnapshotSubprocedure, we could also call HRegion::getReadpoint() 
prior to the flush request and then check HRegion::getMaxFlushedSeqId() after 
calling HRegion::waitForFlushes() to see if we need to retry the call to 
HRegion::flush().  If the max flushed seq ID >= readpoint at the start, then I 
think we can guarantee that all acknowledges writes at the start of the 
snapshot have been persisted.

Sorry to expand the scope here.  This is now well beyond a backport.  Let me 
know your thoughts on these approaches.

> Backport HBASE-18099 'FlushSnapshotSubprocedure should wait for concurrent 
> Region#flush() to finish' to branch-1.3
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18358
>                 URL: https://issues.apache.org/jira/browse/HBASE-18358
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>         Attachments: 18358.branch-1.3.patch
>
>
> HBASE-18099 was only integrated to branch-1 and above in consideration of 
> backward compatibility.
> This issue is to backport the fix to branch-1.3 and branch-1.2.
> Quoting Gary's suggestion from the tail of HBASE-18099 :
> {quote}
> Sure, don't add the method to Region, just to HRegion, check for an instance 
> of HRegion in FlushSnapshotSubprocedure and cast the instance before calling 
> the method.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to