Premal, Didn't go through your entire thread, but the right order is: "map" (N) -> "partition" (N) -> "combine" (0…N).
On Sat, Aug 6, 2011 at 4:04 AM, Premal <[email protected]> wrote: > > According to the attached image found on yahoo's hadoop tutorial, the order > of operations is map > combine > partition which should be followed by > reduce > > Here is my an example key emmited by the map operation > > LongValueSum:geo_US|1311722400|E 1 > > Assuming there are 100 keys of the same type, this should get combined as > > geo_US|1311722400|E 100 > > Then i'd like to partition the keys by the value before the first pipe(|) > http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#A+Useful+Partitioner+Class+%28secondary+sort%2C+the+-partitioner+org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner+option%29 > > geo_US > > Here's the streaming command > > hadoop jar > /usr/local/hadoop/contrib/streaming/hadoop-streaming-0.20.203.0.jar \ > -D mapred.reduce.tasks=8 \ > -D stream.num.map.output.key.fields=1 \ > -D mapred.text.key.partitioner.options=-k1,1 \ > -D stream.map.output.field.separator=\| \ > -file mapper.py \ > -mapper mapper.py \ > -file reducer.py \ > -reducer reducer.py \ > -combiner org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer > \ > -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \ > -input input_file \ > -output output_path > > > This is the error I get > java.lang.NumberFormatException: For input string: "1311722400|E 1" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Long.parseLong(Long.java:419) > at java.lang.Long.parseLong(Long.java:468) > at > org.apache.hadoop.mapred.lib.aggregate.LongValueSum.addNextValue(LongValueSum.java:48) > at > org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:59) > at > org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:35) > at > org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1349) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:253) > > I looks like the partitioner is running before the combiner. Any thoughts? > -- > View this message in context: > http://old.nabble.com/Hadoop-order-of-operations-tp32205781p32205781.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- Harsh J
