[jira] Commented: (MAPREDUCE-735) ArrayIndexOutOfBoundsException is thrown by KeyFieldBasedPartitioner

Iyappan Srinivasan (JIRA) Mon, 13 Jul 2009 05:55:40 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730331#action_12730331
 ]


Iyappan Srinivasan commented on MAPREDUCE-735:
----------------------------------------------

Tested the below scenarios and found them to PASS:

Input for some of the below scenarios for comparator:

Input :
3.6.2.8.9.12.43
3.6.1.8.9.12.43
3.6.6.8.9.12.43
3.6.5.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
3.6.2.8.9.12.43
3.6.9.8.9.12.43
3.6.3.8.9.12.43
3.6.1.8.9.12.43
3.6.5.8.9.12.43
3.6.2.8.9.12.43
3.6.1.8.9.12.43
1.7.8.6.3.2.4.7


1) bin/hadoop jar hadoop-dev-streaming.jar -Dmapred.reduce.tasks=1 
-Dmapred.text.key.partitioner.options=-k1,1 
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
 -Dmap.output.key.field.separator=. 
-Dmapred.text.key.comparator.options=-k3,3nr -input input1/inputfile2  -mapper 
/bin/cat -reducer org.apache.hadoop.mapred.lib.IdentityReducer -output output2

- This sorts it numberically on third field and reverses it.

Output:
3.6.9.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
1.7.8.6.3.2.4.7
3.6.6.8.9.12.43
3.6.5.8.9.12.43
3.6.5.8.9.12.43
3.6.3.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43


2) Sort it on third field, but make it as normal sort. No reverse.

bin/hadoop jar hadoop-dev-streaming.jar -Dmapred.reduce.tasks=1 
-Dmapred.text.key.partitioner.options=-k1,1 
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
 -Dmap.output.key.field.separator=. -Dmapred.text.key.comparator.options=-k3,3n 
-input input1/inputfile2  -mapper /bin/cat 
-reducer=org.apache.hadoop.mapred.lib.IdentityReducer -output output3

3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.3.8.9.12.43
3.6.5.8.9.12.43
3.6.5.8.9.12.43
3.6.6.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
1.7.8.6.3.2.4.7
3.6.9.8.9.12.43

3) sorting on 7th filed and then in that result sort on 3rd field.

bin/hadoop jar hadoop-dev-streaming.jar -Dmapred.reduce.tasks=1 
-Dmapred.text.key.partitioner.options=-k1,1 
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
 -Dmap.output.key.field.separator=. 
-Dmapred.text.key.comparator.options="-k7,7nr -k3,3n" -input input1/inputfile2  
-mapper /bin/cat -reducer org.apache.hadoop.mapred.lib.IdentityReducer -output 
output8

3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.3.8.9.12.43
3.6.5.8.9.12.43
3.6.5.8.9.12.43
3.6.6.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
3.6.9.8.9.12.43
1.7.8.6.3.2.4.7


4) Look for global precedence going off in case of local preference.

bin/hadoop jar hadoop-dev-streaming.jar -Dmapred.reduce.tasks=1 
-Dmapred.text.key.partitioner.options=-k1,1 
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
 -Dmap.output.key.field.separator=. -Dmapred.text.key.comparator.options="-n 
-k7,7r -k3,3n" -input input1/inputfile2  -mapper /bin/cat -reducer 
org.apache.hadoop.mapred.lib.IdentityReducer -output output15

3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.3.8.9.12.43
3.6.5.8.9.12.43
3.6.5.8.9.12.43
3.6.6.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
3.6.9.8.9.12.43
1.7.8.6.3.2.4.7

5) For any special charecters like "^" and "p" and "letters" instead of 
numeric, it still sorts it.

6) Breaking the file into two also gives correct results. The output file 
divides itself into two parts and sorts in that correctly, even for huge sized 
files.This true for all the options.

7) If that column that is going to get sorted is "i", "^", or " ", or "" - null 
then  it shd put it in the end.

8) Introduction of "-Dnum.key.fields.for.partition=5" does not make any 
difference. Does not cause any exception.

Scenarios for  KeyFieldBasedPartitioner :

1) bin/hadoop jar hadoop-streaming.jar 
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
 -Dmapred.text.key.comparator.options="-k5,5"  -Dmapred.reduce.tasks=2 
-Dmapred.text.key.partitioner.options=-k5,5 -Dmap.output.key.field.separator=" 
" -input input1/inputfile2 -mapper org.apache.hadoop.mapred.lib.IdentityMapper 
-reducer org.apache.hadoop.mapred.lib.IdentityReducer -inputformat 
org.apache.hadoop.mapred.KeyValueTextInputFormat -partitioner 
org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner -output output8

It sorts text on the fifth field. I had also tested for other fields.

2) Even if "-Dnum.key.fields.for.partition=5" is added, still it works properly 
without exception..

3) If that column that is going to get sorted is "i", "^", or " ", or "" - It 
sorts it without giving any erros. 

Some points to note are:
1) If  "-rn" option is used anywhere instead of "-nr" , it does not work. This 
is as per requirement.
2) if -D options spelling is wrong it just gets ignored. A config file checker 
seems to be coming soon for all the commands.
3) The documentation for this is not present anywhere. Jira MAPREDUCE-753 is 
raised for it.


> ArrayIndexOutOfBoundsException is thrown by KeyFieldBasedPartitioner
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-735
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-735
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Suman Sehgal
>            Assignee: Amar Kamat
>         Attachments: HADOOP-6130-v1.0.patch, MAPREDUCE-735-v1.2.patch, 
> MAPREDUCE-735-v1.4-branch-0.20.patch, MAPREDUCE-735-v1.4.patch, 
> MAPREDUCE-735-v1.5.patch
>
>
> KeyFieldBasedPartitioner throws "KeyFieldBasedPartitioner" when some part of 
> the specified key is missing. 
> Scenario :
> =======
> when  value of num.key.fields.for.partition is greater than the separators 
> provided in the input.
> Command:
> ========
> hadoop jar streaming.jar -Dmapred.reduce.tasks=3 
> -Dnum.key.fields.for.partition=5 -input <input-dir>  -output <output-dir> 
> -mapper org.apache.hadoop.mapred.lib.IdentityMapper -reducer 
> org.apache.hadoop.mapred.lib.IdentityReducer -inputformat 
> org.apache.hadoop.mapred.KeyValueTextInputFormat -partitioner 
> org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-735) ArrayIndexOutOfBoundsException is thrown by KeyFieldBasedPartitioner

Reply via email to