[
https://issues.apache.org/jira/browse/MAPREDUCE-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974536#action_12974536
]
Todd Lipcon commented on MAPREDUCE-2229:
----------------------------------------
is this MAPREDUCE-1820?
> Initialize reader in Sort example
> ---------------------------------
>
> Key: MAPREDUCE-2229
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2229
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: examples
> Affects Versions: 0.21.0
> Reporter: Alexis
>
> As described in paragraph "Total Sort" in HTDG book, page 223, I tried to
> create a Hadoop job to sort globally some input, using InputSampler with
> TotalOrderPartitioner.
> Please run the mapreduce Sort example with the following arguments to
> reproduce the exception.
> {noformat}
> org.apache.hadoop.examples.Sort
> -r 2
> -outKey org.apache.hadoop.io.Text
> -outValue org.apache.hadoop.io.Text
> -inFormat org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat
> -outFormat org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
> -totalOrder 0.1 10000 10
> test/sortInput
> test/sortOutput
> {noformat}
> The issue is already described there:
> -
> http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201011.mbox/%[email protected]%3e
> - http://www.mail-archive.com/[email protected]/msg01372.html
> This is a somewhat related comment:
> http://www.mail-archive.com/[email protected]/msg03947.html
> We need to initialize the reader to avoid the NPE occuring when generating
> the partition file:
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149)
> at
> org.apache.hadoop.mapreduce.lib.input.KeyValueLineRecordReader.nextKeyValue(KeyValueLineRecordReader.java:91)
> at
> org.apache.hadoop.mapreduce.lib.partition.InputSampler$RandomSampler.getSample(InputSampler.java:220)
> at
> org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:315)
> at org.apache.hadoop.examples.Sort.run(Sort.java:166)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> at org.apache.hadoop.examples.Sort.main(Sort.java:192)
> {noformat}
> Right now, this initialization only happens in runNewMapper in
> org.apache.hadoop.mapred.MapTask, but the sampling is performed before the
> job started. TeraInputFormat class for the TeraSort has its own
> writePartitionFile method. This is the javadoc comment of createRecordReader
> method in InputFormat class:
> {noformat}
> * Create a record reader for a given split. The framework will call
> * {...@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before
> * the split is used.
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.