[ 
https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754354#action_12754354
 ] 

Schubert Zhang commented on HBASE-1818:
---------------------------------------

@ryan
1. Your description of "return null" is good, i.e. null means this meta block 
no exist.
     Accept you comment.  
     Maybe we can throw another Exception here? If you think it is not 
necessary, I will remove this added code.

2.  In the HFile.java, I found the block index is based on the first key in the 
block (not last key).
     Not only treading HFile as a part of HBase, in fact, we want HFile can be 
a common file format which can be used in other applications. And in fact, I 
like to support duplicate keys in an HFile, since my application use HFile 
directly to store data. But  when I checked the code, I found the risk to add 
duplicate keys. e.g. 
      - block 1:  firstKey=A,  lastKey=B,  indexKey=A
      - block 2:  firstKey=B, lastKey=C,  indexKey=B
When seek key=B, we go into block 2, and miss the lastKey=B in block 1.

Yes, you are right, if the index of data block is last key instead of  first 
key. it seems fine:
      - block 1:  firstKey=A, lastKey=B, indexKey=B 
      - block 2:  firstKey=B, lastKey=C, indexKey=C
When seek key=B, we go into block 1. The Scanner.seekTo() will find key=B in 
block 1 from the firstKey of block 1, and the Scanner.next() will not miss the 
key=B in block 2.

But I double checked the HFile code, the block index is really firstKey now.

> HFile code review and refinement
> --------------------------------
>
>                 Key: HBASE-1818
>                 URL: https://issues.apache.org/jira/browse/HBASE-1818
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 0.20.0
>            Reporter: Schubert Zhang
>            Assignee: Schubert Zhang
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: HFile-v3.patch
>
>
> HFile is a good mimic of Google's SSTable file format. And we want HFile to 
> become a common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then 
> provide fixed patch after the review.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to