[
https://issues.apache.org/jira/browse/HBASE-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724637#comment-16724637
]
Zheng Hu commented on HBASE-21401:
----------------------------------
bq. I looked at the patch and I still see double-parse, no? (Once to check
byte array contains a wholesome KV and then the usual parse that happens as
part of KV usage?). Was thinking we could check wholesomeness inline with use?
Yes, it's double-parse now, once to check the wholesome KV, then parse the
specific fields such as row/family/qualifler/ts/type and so on. I did not move
the check wholesomeness inline with use, because I found that in the upper
layer, the cell.getRowOffset() and cell.getRowLength() will be called many
times. take the scan processing as an example:
step.1 load block from hfile, and let the cell to ref to the block;
step.2 compare the row part with given startRow or stopRow in scan, call the
cell.getRowOffset() and cell.getRowOffset();
step.3 Merge with other hfiles, still need compare the row part . call the
cell.getRowOffset() and cell.getRowOffset() ;
step.4 filters ... compare the row/family/qulifier/value.
step.3 Merge with other stores, compare the row part ...
I mean the getRowOffset() and getRowOffset() (or
getFamilyOffset/getFamilyLength() ... ) will be used in the uppler layer so
many times. If we move the row sanity check in getRowOffset() and
getRowOffset(), move the family sanity check in getFamilyOffset() and
getFamilyOffset .... the sanity check will parse the relative fields so many
times too ? the cost even large than the double-check, so i think the
double-parse will be better in our case.
Please correct me if I mis-understood something or missed something.
> Sanity check when constructing the KeyValue
> -------------------------------------------
>
> Key: HBASE-21401
> URL: https://issues.apache.org/jira/browse/HBASE-21401
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5
>
> Attachments: HBASE-21401.v1.patch, HBASE-21401.v2.patch,
> HBASE-21401.v3.patch, HBASE-21401.v4.patch, HBASE-21401.v4.patch,
> HBASE-21401.v5.patch, HBASE-21401.v6.patch, HBASE-21401.v7.patch
>
>
> In KeyValueDecoder & ByteBuffKeyValueDecoder, we pass a byte buffer to
> initialize the Cell without a sanity check (check each field's offset&len
> exceed the byte buffer or not), so ArrayIndexOutOfBoundsException may happen
> when read the cell's fields, such as HBASE-21379, it's hard to debug this
> kind of bug.
> An earlier check will help to find such kind of bugs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)