[ 
https://issues.apache.org/jira/browse/HBASE-27073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689306#comment-17689306
 ] 

Andrew Kyle Purtell edited comment on HBASE-27073 at 2/15/23 6:51 PM:
----------------------------------------------------------------------

bq. Why they all failed at position 65536...

Yes it is suspicious and possibly a buffering issue leading to a short read 
when a buffer becomes full. My comment above may be relevant:

{quote}
When working on WAL value compression a while back I remember *the first 
version used a temporary growable buffer (a ByteArrayOutputStream if I recall 
correctly) to collect all encrypted bytes of the value before submitting the 
payload to the codec*. Later in code review Bharath and I went back and forth a 
bit on a trick with input streams to reduce the number of copies. To fix this I 
would go back to the earlier approach. 
{quote}

when I said "encrypted" I meant "compressed", sorry about that. 

This may be happening outside of value compression. Turn off value compression 
and leave only the base WAL compression enabled and see if it still reproduces. 
However the underlying cause would be the same if my theory is correct... First 
we read a length indicating the size of the compressed bytes to read, then we 
read that length until it is fully complete, and only then can we submit it for 
decompression. We may need a middle buffer to collect the full number of 
compressed bytes over multiple reads from the input stream, if the input stream 
is returning before the full number of bytes are read in a single read call and 
it is necessary to read multiple times from the input stream before the full 
number of compressed bytes are available.


was (Author: apurtell):
bq. Why they all failed at position 65536...

Yes it is suspicious and possibly a buffering issue leading to a short read 
when a buffer becomes full. My comment above may be relevant:

{quote}
When working on WAL value compression a while back I remember *the first 
version used a temporary growable buffer (a ByteArrayOutputStream if I recall 
correctly) to collect all encrypted bytes of the value before submitting the 
payload to the codec*. Later in code review Bharath and I went back and forth a 
bit on a trick with input streams to reduce the number of copies. To fix this I 
would go back to the earlier approach. 
{quote}

> TestReplicationValueCompressedWAL.testMultiplePuts is flaky
> -----------------------------------------------------------
>
>                 Key: HBASE-27073
>                 URL: https://issues.apache.org/jira/browse/HBASE-27073
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>         Environment: Java version: 1.8.0_322
> OS name: "linux", version: "5.10.0-13-arm64", arch: "aarch64", family: "unix"
>            Reporter: Andrew Kyle Purtell
>            Priority: Minor
>             Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
>
> org.apache.hadoop.hbase.replication.regionserver.TestReplicationValueCompressedWAL.testMultiplePuts
>   
Run 1: TestReplicationValueCompressedWAL.testMultiplePuts:56 Waited too 
> much time for replication
>   Run 2: PASS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to