[ 
https://issues.apache.org/jira/browse/HBASE-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878550#comment-15878550
 ] 

CHIA-PING TSAI commented on HBASE-17623:
----------------------------------------

bq. Now say when it is not required and there is no compress/encryption 
happening, we can just pass the accumulated byte[] (offset - length) to the DOS 
which is FSDataOS. Now here at this case also one copy happens. Is that really 
needed?
We can lazily copy the uncompressed data until we need to cache it.

bq. Also for doing compress/encryption, the same accumulated bytes can be 
passed and there any way we create a new compressed/encrypted byte[] which is 
written to DOS. Now temp array copy happens here and we can make that one? Even 
this patch also do not change the #copy ops but make object to stay there 
(byte[]) and reuse rather than recreate. Correct?
Yes

bq. Pls check with G1 also as Ram suggested. With diff types of experiments, we 
can make this area better. This is great work happening here. Thanks for the 
efforts.
I will make more experiments with G1. Thanks for your feedback [~anoop.hbase] 
[~ram_krish].

> Reuse the bytes array when building the hfile block
> ---------------------------------------------------
>
>                 Key: HBASE-17623
>                 URL: https://issues.apache.org/jira/browse/HBASE-17623
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: CHIA-PING TSAI
>            Assignee: CHIA-PING TSAI
>            Priority: Minor
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: after(snappy_hfilesize=5.04GB).png, 
> after(snappy_hfilesize=755MB).png, before(snappy_hfilesize=5.04GB).png, 
> before(snappy_hfilesize=755MB).png, HBASE-17623.branch-1.v0.patch, 
> HBASE-17623.branch-1.v1.patch, HBASE-17623.v0.patch, HBASE-17623.v1.patch, 
> HBASE-17623.v1.patch, memory allocation measurement.xlsx
>
>
> There are two improvements.
> # The uncompressedBlockBytesWithHeader and onDiskBlockBytesWithHeader should 
> maintain a bytes array which can be reused when building the hfile.
> # The uncompressedBlockBytesWithHeader/onDiskBlockBytesWithHeader is copied 
> to an new bytes array only when we need to cache the block.
> {code:title=HFileBlock.java|borderStyle=solid}
>     private void finishBlock() throws IOException {
>       if (blockType == BlockType.DATA) {
>         this.dataBlockEncoder.endBlockEncoding(dataBlockEncodingCtx, 
> userDataStream,
>             baosInMemory.getBuffer(), blockType);
>         blockType = dataBlockEncodingCtx.getBlockType();
>       }
>       userDataStream.flush();
>       // This does an array copy, so it is safe to cache this byte array when 
> cache-on-write.
>       // Header is still the empty, 'dummy' header that is yet to be filled 
> out.
>       uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>       prevOffset = prevOffsetByType[blockType.getId()];
>       // We need to set state before we can package the block up for 
> cache-on-write. In a way, the
>       // block is ready, but not yet encoded or compressed.
>       state = State.BLOCK_READY;
>       if (blockType == BlockType.DATA || blockType == BlockType.ENCODED_DATA) 
> {
>         onDiskBlockBytesWithHeader = dataBlockEncodingCtx.
>             compressAndEncrypt(uncompressedBlockBytesWithHeader);
>       } else {
>         onDiskBlockBytesWithHeader = defaultBlockEncodingCtx.
>             compressAndEncrypt(uncompressedBlockBytesWithHeader);
>       }
>       // Calculate how many bytes we need for checksum on the tail of the 
> block.
>       int numBytes = (int) ChecksumUtil.numBytes(
>           onDiskBlockBytesWithHeader.length,
>           fileContext.getBytesPerChecksum());
>       // Put the header for the on disk bytes; header currently is 
> unfilled-out
>       putHeader(onDiskBlockBytesWithHeader, 0,
>           onDiskBlockBytesWithHeader.length + numBytes,
>           uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>       // Set the header for the uncompressed bytes (for cache-on-write) -- 
> IFF different from
>       // onDiskBlockBytesWithHeader array.
>       if (onDiskBlockBytesWithHeader != uncompressedBlockBytesWithHeader) {
>         putHeader(uncompressedBlockBytesWithHeader, 0,
>           onDiskBlockBytesWithHeader.length + numBytes,
>           uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>       }
>       if (onDiskChecksum.length != numBytes) {
>         onDiskChecksum = new byte[numBytes];
>       }
>       ChecksumUtil.generateChecksums(
>           onDiskBlockBytesWithHeader, 0, onDiskBlockBytesWithHeader.length,
>           onDiskChecksum, 0, fileContext.getChecksumType(), 
> fileContext.getBytesPerChecksum());
>     }{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to