[ 
https://issues.apache.org/jira/browse/HBASE-17384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-17384.
---------------------------
    Resolution: Won't Fix

No work done on this improvement and better to fix why we are STUCK than do a 
workaround.

> Consider aborting region server when MVCC#waitForRead() gets stuck
> ------------------------------------------------------------------
>
>                 Key: HBASE-17384
>                 URL: https://issues.apache.org/jira/browse/HBASE-17384
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Major
>         Attachments: testHRegionWithInMemoryFlush.out
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/5072/testReport/org.apache.hadoop.hbase.regionserver/TestHRegionWithInMemoryFlush/org_apache_hadoop_hbase_regionserver_TestHRegionWithInMemoryFlush/
>  :
> {code}
> org.junit.runners.model.TestTimedOutException: test timed out after 10 minutes
>       at java.lang.Object.wait(Native Method)
>       at 
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:218)
>       at 
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:149)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2732)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2447)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2343)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2304)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1601)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1506)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1456)
>       at 
> org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:374)
>       at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3839)
> {code}
> As can be seen from test output:
> {code}
> 2016-12-28 13:43:28,379 INFO  [Time-limited test] regionserver.HStore(1431): 
> Completed major compaction of 1 (all) file(s) in family1 of 
> testWritesWhileScanning,,1482932605883.2e46061b97a54d7f8434c4a705b3c4a2. into 
> 255e7eb61cfc4945ac5887957d39b1fe(size=98.0 K), total size for store is 98.0 K
> ...[truncated 4062267 bytes]...
> TUCK: MultiVersionConcurrencyControl{readPoint=1090, writePoint=1093}
> 2016-12-28 13:48:29,396 WARN  [Time-limited test] 
> regionserver.MultiVersionConcurrencyControl(214): STUCK: 
> MultiVersionConcurrencyControl{readPoint=1090, writePoint=1093}
> 2016-12-28 13:48:30,406 WARN  [Time-limited test] 
> regionserver.MultiVersionConcurrencyControl(214): STUCK: 
> MultiVersionConcurrencyControl{readPoint=1090, writePoint=1093}
> 2016-12-28 13:48:31,416 WARN  [Time-limited test] 
> regionserver.MultiVersionConcurrencyControl(214): STUCK: 
> MultiVersionConcurrencyControl{readPoint=1090, writePoint=1093}
> {code}
> At least 5 minutes passed with the above log showing waitForRead() stuck.
> Since the flush is blocked, we should consider aborting region server when 
> waitForRead() gets stuck for extended period of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to