[ https://issues.apache.org/jira/browse/HBASE-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819390#comment-17819390 ]
Bryan Beaudreault commented on HBASE-28390: ------------------------------------------- [~apurtell] curious if you have any ideas here? I feel like this is sort of a bug in BlockCompressorStream. It would be better to reset() at the start of the loop, which lets the state become finished when it compresses the entire length. But the problem with fixing that bug is it would be backwards incompatible with all existing compressed data for inputs larger than block size. But I guess I'm not sure how it's working to begin with, since BlockDecompressorStream does not know to read an extra 0 int. So maybe not breaking to fix after all? Or maybe we can work around this in hbase. One thing we could do is not compress values larger than the max input size, and log a warn so operators know to increase it. This seems non-ideal though and would need some sort of marker byte to denote. > WAL value compression fails for cells with large values > ------------------------------------------------------- > > Key: HBASE-28390 > URL: https://issues.apache.org/jira/browse/HBASE-28390 > Project: HBase > Issue Type: Bug > Reporter: Bryan Beaudreault > Priority: Major > > We are testing out WAL compression and noticed that it fails for large values > when both features (wal compression and wal value compression) are enabled. > It works fine with either feature independently, but not when combined. It > seems to fail for all of the value compressor types, and the failure is in > the LRUDictionary of wal key compression: > > {code:java} > java.io.IOException: Error while reading 2 WAL KVs; started reading at 230 > and read up to 396 > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:94) > ~[classes/:?] > at > org.apache.hadoop.hbase.wal.CompressedWALTestBase.doTest(CompressedWALTestBase.java:181) > ~[test-classes/:?] > at > org.apache.hadoop.hbase.wal.CompressedWALTestBase.testForSize(CompressedWALTestBase.java:129) > ~[test-classes/:?] > at > org.apache.hadoop.hbase.wal.CompressedWALTestBase.testLarge(CompressedWALTestBase.java:94) > ~[test-classes/:?] > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:?] > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:?] > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:?] > at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > ~[junit-4.13.2.jar:4.13.2] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at java.lang.Thread.run(Thread.java:829) ~[?:?] > Caused by: java.lang.IndexOutOfBoundsException: index (21) must be less than > size (1) > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1371) > ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5] > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:1353) > ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5] > at > org.apache.hadoop.hbase.io.util.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:153) > ~[classes/:?] > at > org.apache.hadoop.hbase.io.util.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:79) > ~[classes/:?] > at > org.apache.hadoop.hbase.io.util.LRUDictionary.getEntry(LRUDictionary.java:43) > ~[classes/:?] > at > org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.readIntoArray(WALCellCodec.java:366) > ~[classes/:?] > at > org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.parseCell(WALCellCodec.java:307) > ~[classes/:?] > at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:66) > ~[classes/:?] > at org.apache.hadoop.hbase.wal.WALEdit.readFromCells(WALEdit.java:313) > ~[classes/:?] > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALStreamReader.next(ProtobufWALStreamReader.java:84) > ~[classes/:?] > ... 27 more {code} > We've created a unit test which reproduces for each compressor type. It seems > to fail around the 200kb value size for each. -- This message was sent by Atlassian Jira (v8.20.10#820010)