[ https://issues.apache.org/jira/browse/HBASE-18201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524687#comment-16524687 ]
Kuan-Po Tseng commented on HBASE-18201: --------------------------------------- [~chia7712] patch003 is as follows, Bugs(4) - Encoder ROW_INDEX_V1 throw error, things go wrong in class EncodedDataBlock {code:java} this.dataBlockEncoder.endBlockEncoding(encodingCtx, out, baosBytes); {code} the problem is ROW_INDEX_V1 write _onDiskDataSize_ in _out(DataOutputStream)_, the others write _onDisDataSize_ in _baosBytes(byte array)_ directly, since _onDiskDataSize_ is neccessary in the next steps, we need to flush _out_ again after _endBlockEncoding_ to write _onDiskDataSize_. - DataBlockEncodingTool _checkStatistics_ would let currentKV be null, fixed. - DataBlockEncodingTool _checkStatistics_ missing MemstoreTS. - _compressedStream.reset()_ should happen before _compressingStream.resetState()_ since in GZ _resetStatue()_ will write header in outputstream. If we let _compressedStream.reset()_ under _compressingStream.resetState()_, the header is gone. Tests(1) - Going through all the DataBlockEncodingTool with GZ compression algorithm to make sure it will run, not just compile correct. Docs(1) - Write down how to use DataBlockEncodingTool, which options is neccessary and all the options in detail, and the result after using this tool. Others(1) - Change options name OPT_ENCODING_ALGORITHM to OPT_COMPRESSION_ALGORITHM. > add UT and docs for DataBlockEncodingTool > ----------------------------------------- > > Key: HBASE-18201 > URL: https://issues.apache.org/jira/browse/HBASE-18201 > Project: HBase > Issue Type: Task > Components: tooling > Reporter: Chia-Ping Tsai > Assignee: Kuan-Po Tseng > Priority: Minor > Labels: beginner > Attachments: HBASE-18201.master.001.patch, > HBASE-18201.master.002.patch, HBASE-18201.master.002.patch, > HBASE-18201.master.003.patch > > > There is no example, documents, or tests for DataBlockEncodingTool. We should > have it friendly if any use case exists. Otherwise, we should just get rid of > it because DataBlockEncodingTool presumes that the implementation of cell > returned from DataBlockEncoder is KeyValue. The presume may obstruct the > cleanup of KeyValue references in the code base of read/write path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)