[
https://issues.apache.org/jira/browse/MAPREDUCE-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730331#action_12730331
]
Iyappan Srinivasan commented on MAPREDUCE-735:
----------------------------------------------
Tested the below scenarios and found them to PASS:
Input for some of the below scenarios for comparator:
Input :
3.6.2.8.9.12.43
3.6.1.8.9.12.43
3.6.6.8.9.12.43
3.6.5.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
3.6.2.8.9.12.43
3.6.9.8.9.12.43
3.6.3.8.9.12.43
3.6.1.8.9.12.43
3.6.5.8.9.12.43
3.6.2.8.9.12.43
3.6.1.8.9.12.43
1.7.8.6.3.2.4.7
1) bin/hadoop jar hadoop-dev-streaming.jar -Dmapred.reduce.tasks=1
-Dmapred.text.key.partitioner.options=-k1,1
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
-Dmap.output.key.field.separator=.
-Dmapred.text.key.comparator.options=-k3,3nr -input input1/inputfile2 -mapper
/bin/cat -reducer org.apache.hadoop.mapred.lib.IdentityReducer -output output2
- This sorts it numberically on third field and reverses it.
Output:
3.6.9.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
1.7.8.6.3.2.4.7
3.6.6.8.9.12.43
3.6.5.8.9.12.43
3.6.5.8.9.12.43
3.6.3.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43
2) Sort it on third field, but make it as normal sort. No reverse.
bin/hadoop jar hadoop-dev-streaming.jar -Dmapred.reduce.tasks=1
-Dmapred.text.key.partitioner.options=-k1,1
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
-Dmap.output.key.field.separator=. -Dmapred.text.key.comparator.options=-k3,3n
-input input1/inputfile2 -mapper /bin/cat
-reducer=org.apache.hadoop.mapred.lib.IdentityReducer -output output3
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.3.8.9.12.43
3.6.5.8.9.12.43
3.6.5.8.9.12.43
3.6.6.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
1.7.8.6.3.2.4.7
3.6.9.8.9.12.43
3) sorting on 7th filed and then in that result sort on 3rd field.
bin/hadoop jar hadoop-dev-streaming.jar -Dmapred.reduce.tasks=1
-Dmapred.text.key.partitioner.options=-k1,1
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
-Dmap.output.key.field.separator=.
-Dmapred.text.key.comparator.options="-k7,7nr -k3,3n" -input input1/inputfile2
-mapper /bin/cat -reducer org.apache.hadoop.mapred.lib.IdentityReducer -output
output8
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.3.8.9.12.43
3.6.5.8.9.12.43
3.6.5.8.9.12.43
3.6.6.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
3.6.9.8.9.12.43
1.7.8.6.3.2.4.7
4) Look for global precedence going off in case of local preference.
bin/hadoop jar hadoop-dev-streaming.jar -Dmapred.reduce.tasks=1
-Dmapred.text.key.partitioner.options=-k1,1
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
-Dmap.output.key.field.separator=. -Dmapred.text.key.comparator.options="-n
-k7,7r -k3,3n" -input input1/inputfile2 -mapper /bin/cat -reducer
org.apache.hadoop.mapred.lib.IdentityReducer -output output15
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.1.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.2.8.9.12.43
3.6.3.8.9.12.43
3.6.5.8.9.12.43
3.6.5.8.9.12.43
3.6.6.8.9.12.43
3.6.8.8.9.12.43
3.6.8.8.9.12.43
3.6.9.8.9.12.43
1.7.8.6.3.2.4.7
5) For any special charecters like "^" and "p" and "letters" instead of
numeric, it still sorts it.
6) Breaking the file into two also gives correct results. The output file
divides itself into two parts and sorts in that correctly, even for huge sized
files.This true for all the options.
7) If that column that is going to get sorted is "i", "^", or " ", or "" - null
then it shd put it in the end.
8) Introduction of "-Dnum.key.fields.for.partition=5" does not make any
difference. Does not cause any exception.
Scenarios for KeyFieldBasedPartitioner :
1) bin/hadoop jar hadoop-streaming.jar
-Dmapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
-Dmapred.text.key.comparator.options="-k5,5" -Dmapred.reduce.tasks=2
-Dmapred.text.key.partitioner.options=-k5,5 -Dmap.output.key.field.separator="
" -input input1/inputfile2 -mapper org.apache.hadoop.mapred.lib.IdentityMapper
-reducer org.apache.hadoop.mapred.lib.IdentityReducer -inputformat
org.apache.hadoop.mapred.KeyValueTextInputFormat -partitioner
org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner -output output8
It sorts text on the fifth field. I had also tested for other fields.
2) Even if "-Dnum.key.fields.for.partition=5" is added, still it works properly
without exception..
3) If that column that is going to get sorted is "i", "^", or " ", or "" - It
sorts it without giving any erros.
Some points to note are:
1) If "-rn" option is used anywhere instead of "-nr" , it does not work. This
is as per requirement.
2) if -D options spelling is wrong it just gets ignored. A config file checker
seems to be coming soon for all the commands.
3) The documentation for this is not present anywhere. Jira MAPREDUCE-753 is
raised for it.
> ArrayIndexOutOfBoundsException is thrown by KeyFieldBasedPartitioner
> --------------------------------------------------------------------
>
> Key: MAPREDUCE-735
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-735
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.20.1
> Reporter: Suman Sehgal
> Assignee: Amar Kamat
> Attachments: HADOOP-6130-v1.0.patch, MAPREDUCE-735-v1.2.patch,
> MAPREDUCE-735-v1.4-branch-0.20.patch, MAPREDUCE-735-v1.4.patch,
> MAPREDUCE-735-v1.5.patch
>
>
> KeyFieldBasedPartitioner throws "KeyFieldBasedPartitioner" when some part of
> the specified key is missing.
> Scenario :
> =======
> when value of num.key.fields.for.partition is greater than the separators
> provided in the input.
> Command:
> ========
> hadoop jar streaming.jar -Dmapred.reduce.tasks=3
> -Dnum.key.fields.for.partition=5 -input <input-dir> -output <output-dir>
> -mapper org.apache.hadoop.mapred.lib.IdentityMapper -reducer
> org.apache.hadoop.mapred.lib.IdentityReducer -inputformat
> org.apache.hadoop.mapred.KeyValueTextInputFormat -partitioner
> org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.