[ 
https://issues.apache.org/jira/browse/HBASE-27637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688251#comment-17688251
 ] 

Andrew Kyle Purtell edited comment on HBASE-27637 at 2/14/23 3:06 AM:
----------------------------------------------------------------------

bq. it turned out that, if the value length is 0, then the compressed length 
will be 4, but while reading, we will read nothing so we will not read the 4 
bytes

Ah. Value compression should do nothing if the value is zero, this is the code 
bug. There is a missing test for value.length > 0. 

Its been a while since I've looked at this code. If we unconditionally use the 
compressor, to "write" 0 bytes, then the compression codec will emit 
overheads... hadoop compressionstream header, compression bitstream header. All 
of that should be skipped so the value size we write on disk is 0 and truly no 
value data follows the length. I see you have already taken this issue 
[~zhangduo]. Let me know if you'd rather I patch it, as this is my code that is 
not doing the correct thing here.


was (Author: apurtell):
bq. it turned out that, if the value length is 0, then the compressed length 
will be 4, but while reading, we will read nothing so we will not read the 4 
bytes

Ah. Value compression should do nothing if the value is zero, this is the code 
bug. 

Its been a while since I've looked at this code. If we unconditionally use the 
compressor, to "write" 0 bytes, then the compression codec will emit 
overheads... hadoop compressionstream header, compression bitstream header. All 
of that should be skipped so the value size we write on disk is 0 and truly no 
value data follows the length. I see you have already taken this issue 
[~zhangduo]. Let me know if you'd rather I patch it, as this is my code that is 
not doing the correct thing here.

> Zero length value would cause value compressor read nothing and not advance 
> the position of the InputStream
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-27637
>                 URL: https://issues.apache.org/jira/browse/HBASE-27637
>             Project: HBase
>          Issue Type: Bug
>          Components: dataloss, wal
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Critical
>
> This is a code sniff from the discussion of HBASE-27073
> {code}
>   public static void main(String[] args) throws Exception {
>     CompressionContext ctx =
>       new CompressionContext(LRUDictionary.class, false, false, true, 
> Compression.Algorithm.GZ);
>     ValueCompressor compressor = ctx.getValueCompressor();
>     byte[] compressed = compressor.compress(new byte[0], 0, 0);
>     System.out.println("compressed length: " + compressed.length);
>     ByteArrayInputStream bis = new ByteArrayInputStream(compressed);
>     int read = compressor.decompress(bis, compressed.length, new byte[0], 0, 
> 0);
>     System.out.println("read length: " + read);
>     System.out.println("position: " + (compressed.length - bis.available()));
> {code}
> And the output is
> {noformat}
> compressed length: 20
> read length: 0
> position: 0
> {noformat}
> So it turns out that, when compressing, an empty array will still generate 
> some output bytes but while reading, we will skip reading anything if we find 
> the output length is zero, so next time when we read from the stream, we will 
> start at a wrong position...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to