[ 
https://issues.apache.org/jira/browse/HBASE-27637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688255#comment-17688255
 ] 

Duo Zhang commented on HBASE-27637:
-----------------------------------

[~apurtell] Thanks for looking at this.

I think your propose can be another issue, for optimizing the write 
implementation. The problem here is that, we have already write some cells out 
like this, and we should be able to read them back. So I will provide a PR 
focus on the reading part, if the outLength is zero but inLength is not zero, 
we need to manually skip the inLength bytes, as the BufferedInputStream will 
not do this for us.

Then you can optimize the implementation on the writing part, so for zero 
length value, we could reduce the overhead. And when reading, the inLength and 
outLength will be both zero.

WDYT?

Thanks.

> Zero length value would cause value compressor read nothing and not advance 
> the position of the InputStream
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-27637
>                 URL: https://issues.apache.org/jira/browse/HBASE-27637
>             Project: HBase
>          Issue Type: Bug
>          Components: dataloss, wal
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Critical
>
> This is a code sniff from the discussion of HBASE-27073
> {code}
>   public static void main(String[] args) throws Exception {
>     CompressionContext ctx =
>       new CompressionContext(LRUDictionary.class, false, false, true, 
> Compression.Algorithm.GZ);
>     ValueCompressor compressor = ctx.getValueCompressor();
>     byte[] compressed = compressor.compress(new byte[0], 0, 0);
>     System.out.println("compressed length: " + compressed.length);
>     ByteArrayInputStream bis = new ByteArrayInputStream(compressed);
>     int read = compressor.decompress(bis, compressed.length, new byte[0], 0, 
> 0);
>     System.out.println("read length: " + read);
>     System.out.println("position: " + (compressed.length - bis.available()));
> {code}
> And the output is
> {noformat}
> compressed length: 20
> read length: 0
> position: 0
> {noformat}
> So it turns out that, when compressing, an empty array will still generate 
> some output bytes but while reading, we will skip reading anything if we find 
> the output length is zero, so next time when we read from the stream, we will 
> start at a wrong position...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to