[
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383887#comment-16383887
]
Ted Yu commented on HBASE-20090:
--------------------------------
With more debug logging, we can see the scenario where the exception was
triggered.
{code}
2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit:
Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085.,
compaction_queue=(0:0), split_queue=1
2018-03-02 17:28:30,098 DEBUG
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020]
regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info
size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
2018-03-02 17:28:30,296 INFO
[RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020]
regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher:
Flush thread woke up because memory above low water=381.5 M
2018-03-02 17:28:30,297 INFO
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020]
regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher:
region TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size
400432696
2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher:
region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher:
Flush of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
due to global heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal
Memstore Heap size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher:
wake up by WAKEUPFLUSH_INSTANCE
2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher:
Nothing to flush for
atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher:
Excluding unflushable region
atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. - trying to
find a different region to flush.
{code}
Region 0453f29030757eedb6e6a1c57e88c085 was being split.
When MemStoreFlusher ran to this call, the region was no longer a flush
candidate:
{code}
HRegion bestFlushableRegion =
getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
{code}
So the other region,
atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined
next. Since the region was not receiving write, the (current) Precondition
check failed.
The proposed fix is to convert the Precondition to normal log.
> Properly handle Preconditions check failure in
> MemStoreFlusher$FlushHandler.run
> -------------------------------------------------------------------------------
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Priority: Major
> Attachments: 20094.v01.patch
>
>
> Here is the code in branch-2 :
> {code}
> try {
> wakeupPending.set(false); // allow someone to wake us up again
> fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
> if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
> if (!flushOneForGlobalPressure()) {
> ...
> FlushRegionEntry fre = (FlushRegionEntry) fqe;
> if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
> LOG.error("Cache flusher failed for entry " + fqe, ex);
> if (!server.checkFileSystem()) {
> break;
> }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
> Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)