[ 
https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755701#action_12755701
 ] 

ryan rawson commented on HBASE-1818:
------------------------------------

in reference to the block index, yes the scenario with duplicate keys that span 
a block boundary exists. 

It's possible that we could fix these holes with a different write strategy 
which doesnt create invalid hfiles like the one you theorized above. Another 
scenario is when you could have duplicate key entries in the index, which could 
cause problems with the binary search algorithm.

There is 2 potential fixes here:
- fix binary search algorithm to actually find the _lower bound_ in face of 
duplicates.
- prevent hfiles like the one indicated above from being created, in this case 
by extending block 1 larger than the default sizing until we get a different 
key.

there might be other solutions too.

> HFile code review and refinement
> --------------------------------
>
>                 Key: HBASE-1818
>                 URL: https://issues.apache.org/jira/browse/HBASE-1818
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.20.0
>            Reporter: Schubert Zhang
>            Assignee: Schubert Zhang
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: HFile-v3.patch, HFile-v4.patch
>
>
> HFile is a good mimic of Google's SSTable file format. And we want HFile to 
> become a common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then 
> provide fixed patch after the review.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to