[jira] Commented: (HADOOP-5779) KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8

Jothi Padmanabhan (JIRA) Fri, 12 Jun 2009 03:43:34 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718791#action_12718791
 ]


Jothi Padmanabhan commented on HADOOP-5779:
-------------------------------------------

Some minor comments:

# I think endChar cannot be negative, so the check endChar < 0 can be removed. 
Could you check?
# Instead of doing i <= end && i < b.length in the hashCode(), I think we 
should ideally fix the getEndOffset to return min (end, b.length -1).  But I 
would not -1 for that, I am OK with the existing simple change in the patch as 
well
# In the test case, adding an assert to verify the returned partition is 0 
would be good.

> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it 
> should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5779
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5779
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: ZhuGuanyin
>             Fix For: 0.21.0
>
>         Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, 
> encode-free-KeyFieldBasedPartitioner.patch, HADOOP-5779-partial.patch, 
> HADOOP-5779-v1.0.patch.patch
>
>
> 1) Currently,  KeyFieldBasedPartitioner only support utf8 encoded recored,  
> we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the 
> specified field, the endChar would equal with array.length, which throw 
> ArrayOutOfIndex exception, losting that record!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5779) KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8

Reply via email to