[
https://issues.apache.org/jira/browse/HBASE-26438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443100#comment-17443100
]
Hudson commented on HBASE-26438:
--------------------------------
Results for branch master
[build #442 on
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/442/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/442/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/442/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/442/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> Fix flaky test TestHStore.testCompactingMemStoreCellExceedInmemoryFlushSize
> ----------------------------------------------------------------------------
>
> Key: HBASE-26438
> URL: https://issues.apache.org/jira/browse/HBASE-26438
> Project: HBase
> Issue Type: Bug
> Components: test
> Affects Versions: 3.0.0-alpha-1, 2.4.8
> Reporter: chenglei
> Assignee: chenglei
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.9
>
>
> {{TestHStore.testCompactingMemStoreCellExceedInmemoryFlushSize}} sometimes is
> failed and the error stack is :
> {code:java}
> org.apache.hadoop.hbase.DroppedSnapshotException: Failed clearing memory
> after 6 attempts on region:
> table,,1636366699236.cc87b170ebf180005d5c4bf09d404b3d.
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1788)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1595)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1563)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1528)
> at
> org.apache.hadoop.hbase.regionserver.TestHStore.tearDown(TestHStore.java:661)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at
> org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at
> org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> If I modify the test code to make the test method finished after waiting for
> in memory compacting completed, the test would fail all the time with the
> same error stack, and if the test method finished before the in memory
> compacting start, the test would always success.
> Why the test would fail after the in memory compacting completed is somewhat
> subtle:
> 1. Because {{TestHStore}} only test {{HStore}} , so in following
> {{TestHStore.init}} method the new store is created separately and does not
> added to {{HRegion.stores}},
> {code:java}
> private HStore init(String methodName, Configuration conf,
> TableDescriptorBuilder builder,
> ColumnFamilyDescriptor hcd, MyStoreHook hook, boolean switchToPread)
> throws IOException {
> initHRegion(methodName, conf, builder, hcd, hook, switchToPread);
> if (hook == null) {
> store = new HStore(region, hcd, conf, false);
> } else {
> store = new MyStore(region, hcd, conf, hook, switchToPread);
> }
> return store;
> }
> {code}
> 2.When cell is added to Store,{{HStore.add}} method would not increase
> {{HRegion.memStoreSizing}}, {{HRegion.memStoreSizing}} is always 0, but when
> in memory compaction, the cell is forced to copy to MSLAB by
> {{CellChunkImmutableSegment.copyCellIntoMSLAB}} in following line 218 in
> {{CellChunkImmutableSegment.reinitializeCellSet}} ,so the type of the cell is
> changed from {{KeyValue}} to {{NoTagByteBufferChunkKeyValue}},which size
> increased 4 byte and {{HRegion.memStoreSizing}} is 4 :
> {code:java}
> 211 try {
> 212 while ((curCell = segmentScanner.next()) != null) {
> 213 assert(curCell instanceof ExtendedCell);
> 214 if (((ExtendedCell)curCell).getChunkId() ==
> ExtendedCell.CELL_NOT_BASED_ON_CHUNK) {
> 215 // CellChunkMap assumes all cells are allocated on MSLAB.
> 216 // Therefore, cells which are not allocated on MSLAB initially,
> 217 // are copied into MSLAB here.
> 218 curCell = copyCellIntoMSLAB(curCell, memstoreSizing);
> 219 }
> {code}
> 3. When the region close, following {{HRegion.doClose}} finds that
> {{HRegion.memStoreSizing}} is not zero in line 1772 and tries to flush the
> region,but the {{HRegion.stores}} is empty so flushing takes no effect, when
> the {{failedfFlushCount}} bigger than 5, {{DroppedSnapshotException}} is
> thrown.
> {code:java}
> 1771 long remainingSize = this.memStoreSizing.getDataSize();
> 1772 while (remainingSize > 0) {
> 1773 try {
> 1774 internalFlushcache(status);
> 1775 if(flushCount >0) {
> 1776 LOG.info("Running extra flush, " + flushCount +
> 1777 " (carrying snapshot?) " + this);
> 1778 }
> 1779 flushCount++;
> 1780 tmp = this.memStoreSizing.getDataSize();
> 1781 if (tmp >= remainingSize) {
> 1782 failedfFlushCount++;
> 1783 }
> 1784 remainingSize = tmp;
> 1785 if (failedfFlushCount > 5) {
> 1786 // If we failed 5 times and are unable to clear memory,
> abort
> 1787 // so we do not lose data
> 1788 throw new DroppedSnapshotException("Failed clearing memory
> after " +
> 1789 flushCount + " attempts on region: " +
> 1790 Bytes.toStringBinary(getRegionInfo().getRegionName()));
> 1791 }
> {code}
> My fix is adding the new created {{HStore}} to {{HRegion.stores}} in
> {{TestHStore.init}}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)