[
https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755844#action_12755844
]
Schubert Zhang commented on HBASE-1818:
---------------------------------------
Thanks stack for create a new issue (hbase-1841)
Regards the 2 potential fixes:
- fix binary search algorithm to actually find the lower bound in face of
duplicates.
I think maybe we need to change to use lastkey as the block index?
- prevent hfiles like the one indicated above from being created, in this
case by extending block 1 larger than the default sizing until we get a
different key.
In fact, we used this way in one of our old product, i.e. only start new
block/index at the boundary of different key. In this case, we should ensure
the number of the duplicated keys not too large (that will lead big block).
> HFile code review and refinement
> --------------------------------
>
> Key: HBASE-1818
> URL: https://issues.apache.org/jira/browse/HBASE-1818
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: io
> Affects Versions: 0.20.0
> Reporter: Schubert Zhang
> Assignee: Schubert Zhang
> Priority: Minor
> Fix For: 0.21.0
>
> Attachments: HFile-v3.patch, HFile-v4.patch, HFile-v5.patch
>
>
> HFile is a good mimic of Google's SSTable file format. And we want HFile to
> become a common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then
> provide fixed patch after the review.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.