[ 
https://issues.apache.org/jira/browse/HBASE-26438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443167#comment-17443167
 ] 

Hudson commented on HBASE-26438:
--------------------------------

Results for branch branch-2
        [build #393 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/393/]:
 (x) *{color:red}-1 overall{color}*
----
details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/393/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/393/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/393/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/393/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Fix  flaky test TestHStore.testCompactingMemStoreCellExceedInmemoryFlushSize
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-26438
>                 URL: https://issues.apache.org/jira/browse/HBASE-26438
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0-alpha-1, 2.4.8
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.9
>
>
> {{TestHStore.testCompactingMemStoreCellExceedInmemoryFlushSize}} sometimes is 
> failed and the error stack is :
> {code:java}
> org.apache.hadoop.hbase.DroppedSnapshotException: Failed clearing memory 
> after 6 attempts on region: 
> table,,1636366699236.cc87b170ebf180005d5c4bf09d404b3d.
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1788)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1595)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1563)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1528)
>       at 
> org.apache.hadoop.hbase.regionserver.TestHStore.tearDown(TestHStore.java:661)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>       at 
> org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>       at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>       at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>       at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>       at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at 
> org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> If I modify the test code to make the test method finished after waiting for 
> in memory compacting completed, the test would fail all the time with the 
> same error stack, and if the test method finished before the in memory 
> compacting start, the test would always success.  
> Why the test would fail after the in memory compacting completed is somewhat 
> subtle:
> 1. Because  {{TestHStore}} only test  {{HStore}} , so in following 
> {{TestHStore.init}} method the new store is created separately and does not 
> added to {{HRegion.stores}},
> {code:java}
> private HStore init(String methodName, Configuration conf, 
> TableDescriptorBuilder builder,
>       ColumnFamilyDescriptor hcd, MyStoreHook hook, boolean switchToPread) 
> throws IOException {
>     initHRegion(methodName, conf, builder, hcd, hook, switchToPread);
>     if (hook == null) {
>       store = new HStore(region, hcd, conf, false);
>     } else {
>       store = new MyStore(region, hcd, conf, hook, switchToPread);
>     }
>     return store;
>   }
> {code}
> 2.When cell is added to Store,{{HStore.add}} method would not increase 
> {{HRegion.memStoreSizing}}, {{HRegion.memStoreSizing}} is always 0, but when 
> in memory compaction, the cell is forced to copy to MSLAB by 
> {{CellChunkImmutableSegment.copyCellIntoMSLAB}} in following line 218 in 
> {{CellChunkImmutableSegment.reinitializeCellSet}} ,so the type of the cell is 
> changed from {{KeyValue}} to {{NoTagByteBufferChunkKeyValue}},which size 
> increased 4 byte and {{HRegion.memStoreSizing}} is 4 :
> {code:java}
>  211  try {
>  212    while ((curCell = segmentScanner.next()) != null) {
>  213      assert(curCell instanceof ExtendedCell);
>  214      if (((ExtendedCell)curCell).getChunkId() == 
> ExtendedCell.CELL_NOT_BASED_ON_CHUNK) {
>  215        // CellChunkMap assumes all cells are allocated on MSLAB.
>  216         // Therefore, cells which are not allocated on MSLAB initially,
>  217          // are copied into MSLAB here.
>  218         curCell = copyCellIntoMSLAB(curCell, memstoreSizing);
>  219      }
> {code}
> 3. When the region close, following {{HRegion.doClose}} finds that  
> {{HRegion.memStoreSizing}}  is not zero in line 1772 and tries to flush the 
> region,but the {{HRegion.stores}} is empty so flushing takes no effect, when 
> the {{failedfFlushCount}} bigger than 5, {{DroppedSnapshotException}} is 
> thrown.
> {code:java}
> 1771       long remainingSize = this.memStoreSizing.getDataSize();
> 1772        while (remainingSize > 0) {
> 1773          try {
> 1774            internalFlushcache(status);
> 1775            if(flushCount >0) {
> 1776              LOG.info("Running extra flush, " + flushCount +
> 1777                  " (carrying snapshot?) " + this);
> 1778            }
> 1779            flushCount++;
> 1780            tmp = this.memStoreSizing.getDataSize();
> 1781            if (tmp >= remainingSize) {
> 1782              failedfFlushCount++;
> 1783            }
> 1784            remainingSize = tmp;
> 1785            if (failedfFlushCount > 5) {
> 1786              // If we failed 5 times and are unable to clear memory, 
> abort
> 1787              // so we do not lose data
> 1788              throw new DroppedSnapshotException("Failed clearing memory 
> after " +
> 1789                  flushCount + " attempts on region: " +
> 1790                  Bytes.toStringBinary(getRegionInfo().getRegionName()));
> 1791            }
> {code}
> My fix is adding the new created {{HStore}} to {{HRegion.stores}} in 
> {{TestHStore.init}} 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to