[
https://issues.apache.org/jira/browse/HBASE-18358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082802#comment-16082802
]
Gary Helmling commented on HBASE-18358:
---------------------------------------
Thanks for putting up the patch, Ted.
Looking over the approach, I'm not sure this is sufficient, though.
HRegion::flushcache() could return CANNOT_FLUSH when only a single memstore is
being flushed from a multi column family table. In this case, we will wait for
the single memstore to complete, but I think the other memstores could remain
unflushed and would not be part of the snapshot.
I think we have a couple options:
# in FlushSnapshotSubprocedure, we could still call HRegion::waitForFlushes(),
then retry calling HRegion::flush(true) a number of time until we get a result
of FLUSHED_NO_COMPACTION_NEEDED | FLUSHED_COMPACTION_NEEDED |
CANNOT_FLUSH_MEMSTORE_EMPTY.
# in FlushSnapshotSubprocedure, we could also call HRegion::getReadpoint()
prior to the flush request and then check HRegion::getMaxFlushedSeqId() after
calling HRegion::waitForFlushes() to see if we need to retry the call to
HRegion::flush(). If the max flushed seq ID >= readpoint at the start, then I
think we can guarantee that all acknowledges writes at the start of the
snapshot have been persisted.
Sorry to expand the scope here. This is now well beyond a backport. Let me
know your thoughts on these approaches.
> Backport HBASE-18099 'FlushSnapshotSubprocedure should wait for concurrent
> Region#flush() to finish' to branch-1.3
> ------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-18358
> URL: https://issues.apache.org/jira/browse/HBASE-18358
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Priority: Critical
> Attachments: 18358.branch-1.3.patch
>
>
> HBASE-18099 was only integrated to branch-1 and above in consideration of
> backward compatibility.
> This issue is to backport the fix to branch-1.3 and branch-1.2.
> Quoting Gary's suggestion from the tail of HBASE-18099 :
> {quote}
> Sure, don't add the method to Region, just to HRegion, check for an instance
> of HRegion in FlushSnapshotSubprocedure and cast the instance before calling
> the method.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)