[
https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754321#action_12754321
]
ryan rawson commented on HBASE-1818:
------------------------------------
So if compress() returns null (which it should not), some kind of error
occurred. Such as DFS out to lunch, etc, etc. Returning 'null' - which is
interpreted as 'no such meta block' is really ruining the API contract here.
Just let it NPE. Better than converting a IO error into a logical error as
your code would do.
As for the duplicate key, the block index is based on the last key in the
block. So if we have two duplicate keys that straddle a block boundary, the
scenario is like so:
- last key of block X is key 'A' and that is entered in the index
- first key of block X+1 is also key 'A', but first key is not part of the index
Scanning will start at block X, and we will see both.
The only scenario where we could potentially have an issue is if we have
multiple duplicate keys so they span more than 1 block thus causing 2 index
entries to have the same key. The binary search algorithm will need to choose
the first entry to maintain correct behaviour on scan, and I am not 100% sure
if this is what will happen. But even so, this is very rare, since in HBase
keys are distinguished with timestamps, it would require either many keys with
the same TS or a few really large ones with the same TS. Something to consider
testing for.
More importantly, if we disallow duplicate keys in hfile, that will cause huge
problems. Right now there is no other mechanism to prevent duplicate KeyValues
from being inserted - its your own lunch if you put multiple values at the same
Timestamp. But this change would cause compactions to throw and prevent them
from completing. A far worse scenario.
> HFile code review and refinement
> --------------------------------
>
> Key: HBASE-1818
> URL: https://issues.apache.org/jira/browse/HBASE-1818
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: io
> Affects Versions: 0.20.0
> Reporter: Schubert Zhang
> Assignee: Schubert Zhang
> Priority: Minor
> Fix For: 0.21.0
>
> Attachments: HFile-v3.patch
>
>
> HFile is a good mimic of Google's SSTable file format. And we want HFile to
> become a common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then
> provide fixed patch after the review.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.