[
https://issues.apache.org/jira/browse/HBASE-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753403#action_12753403
]
Schubert Zhang commented on HBASE-1818:
---------------------------------------
@ryan
1. Regards duplicate keys in a HFile. I am following concern:
If we allow duplicate keys. Consider following scenario:
A key="abcd" is append in block1's last key/value pair.
And the the same key="abcd" is append in block2's first key/value pair.
Then in the block index, the key="abcd" will point to block2.
Then, we want to scan from key="abcd", but the first key="abcd" (in
block1's last) will be missed out.
Can you confirm this scenario is acceptable or required?
2. + if (buf == null)
+ return null;
This check is only added in getMetaBlock(...). In this method, there are three
points to return null.
(1) if (trailer.metaIndexCount == 0) {
return null; // there are no meta blocks
}
(2) if (block == -1)
return null;
(3) if (buf == null) //new added by me
return null;
If we do not check it, the following buf.get(..) may NPE. because the
decompress() method will not throw exception. Do you mean NPE is better than
"return null" which same as above two?
In fact, it is diffcult to make above trade-off for me, maybe I am doing the
way as C++.
3. Regards buf.compact().
Yes, you may be right. After more test about performance, my patch does not
improve the performance (I don't know if it can improve in some other
environments.) I agree to remove this modification in my patch to retain the
neat of the returned block buffer (position at 0).
@stack and ryan
Thanks for your test. I will change the patch according to you comments. To
include only bug fixing.
If the test fail, please just revert to old version.
Please give me comments about my above questions, then I can make active
immediately. Thanks.
> HFile code review and refinement
> --------------------------------
>
> Key: HBASE-1818
> URL: https://issues.apache.org/jira/browse/HBASE-1818
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: io
> Affects Versions: 0.20.0
> Reporter: Schubert Zhang
> Assignee: Schubert Zhang
> Priority: Minor
> Fix For: 0.21.0
>
> Attachments: HFile-v1.patch, HFile-v2.patch
>
>
> HFile is a good mimic of Google's SSTable file format. And we want HFile to
> become a common file format of hadoop in the near future.
> We will review the code of HFile and record the comments here, and then
> provide fixed patch after the review.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.