[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis resolved MAPREDUCE-2229.
-------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.22.0

> Initialize reader in Sort example
> ---------------------------------
>
>                 Key: MAPREDUCE-2229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2229
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: examples
>    Affects Versions: 0.21.0
>            Reporter: Alexis
>             Fix For: 0.22.0
>
>
> As described in paragraph "Total Sort" in HTDG book, page 223, I tried to 
> create a Hadoop job to sort globally some input, using InputSampler with 
> TotalOrderPartitioner.
> Please run the mapreduce Sort example with the following arguments to 
> reproduce the exception.
> {noformat}
> org.apache.hadoop.examples.Sort
>       -r 2
>       -outKey org.apache.hadoop.io.Text
>       -outValue org.apache.hadoop.io.Text
>       -inFormat org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat
>       -outFormat org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
>       -totalOrder 0.1 10000 10
>       test/sortInput
>       test/sortOutput
> {noformat}
> The issue is already described there:
> - 
> http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201011.mbox/%[email protected]%3E
> - http://www.mail-archive.com/[email protected]/msg01372.html
> This is a somewhat related comment:
> http://www.mail-archive.com/[email protected]/msg03947.html
> We need to initialize the reader to avoid the NPE occuring when generating 
> the partition file:
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
>       at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149)
>       at 
> org.apache.hadoop.mapreduce.lib.input.KeyValueLineRecordReader.nextKeyValue(KeyValueLineRecordReader.java:91)
>       at 
> org.apache.hadoop.mapreduce.lib.partition.InputSampler$RandomSampler.getSample(InputSampler.java:220)
>       at 
> org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:315)
>       at org.apache.hadoop.examples.Sort.run(Sort.java:166)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>       at org.apache.hadoop.examples.Sort.main(Sort.java:192)
> {noformat} 
> Right now, this initialization only happens in runNewMapper in 
> org.apache.hadoop.mapred.MapTask, but the sampling is performed before the 
> job started. TeraInputFormat class for the TeraSort has its own 
> writePartitionFile method. This is the javadoc comment of createRecordReader 
> method in InputFormat class:
> {noformat}
>    * Create a record reader for a given split. The framework will call
>    * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before
>    * the split is used.
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to