[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718791#action_12718791 ]
Jothi Padmanabhan commented on HADOOP-5779: ------------------------------------------- Some minor comments: # I think endChar cannot be negative, so the check endChar < 0 can be removed. Could you check? # Instead of doing i <= end && i < b.length in the hashCode(), I think we should ideally fix the getEndOffset to return min (end, b.length -1). But I would not -1 for that, I am OK with the existing simple change in the patch as well # In the test case, adding an assert to verify the returned partition is 0 would be good. > KeyFieldBasedPartitioner would lost data if specifed field not exist, and it > should encode free not only support utf8 > --------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-5779 > URL: https://issues.apache.org/jira/browse/HADOOP-5779 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.20.0 > Reporter: ZhuGuanyin > Fix For: 0.21.0 > > Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, > encode-free-KeyFieldBasedPartitioner.patch, HADOOP-5779-partial.patch, > HADOOP-5779-v1.0.patch.patch > > > 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, > we should use text or byteswriteable data types. > 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the > specified field, the endChar would equal with array.length, which throw > ArrayOutOfIndex exception, losting that record! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.