[jira] Commented: (MAPREDUCE-2229) Initialize reader in Sort example

Todd Lipcon (JIRA) Wed, 22 Dec 2010 22:13:26 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974536#action_12974536
 ]


Todd Lipcon commented on MAPREDUCE-2229:
----------------------------------------

is this MAPREDUCE-1820?

> Initialize reader in Sort example
> ---------------------------------
>
>                 Key: MAPREDUCE-2229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2229
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: examples
>    Affects Versions: 0.21.0
>            Reporter: Alexis
>
> As described in paragraph "Total Sort" in HTDG book, page 223, I tried to 
> create a Hadoop job to sort globally some input, using InputSampler with 
> TotalOrderPartitioner.
> Please run the mapreduce Sort example with the following arguments to 
> reproduce the exception.
> {noformat}
> org.apache.hadoop.examples.Sort
>       -r 2
>       -outKey org.apache.hadoop.io.Text
>       -outValue org.apache.hadoop.io.Text
>       -inFormat org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat
>       -outFormat org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
>       -totalOrder 0.1 10000 10
>       test/sortInput
>       test/sortOutput
> {noformat}
> The issue is already described there:
> - 
> http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201011.mbox/%[email protected]%3e
> - http://www.mail-archive.com/[email protected]/msg01372.html
> This is a somewhat related comment:
> http://www.mail-archive.com/[email protected]/msg03947.html
> We need to initialize the reader to avoid the NPE occuring when generating 
> the partition file:
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
>       at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149)
>       at 
> org.apache.hadoop.mapreduce.lib.input.KeyValueLineRecordReader.nextKeyValue(KeyValueLineRecordReader.java:91)
>       at 
> org.apache.hadoop.mapreduce.lib.partition.InputSampler$RandomSampler.getSample(InputSampler.java:220)
>       at 
> org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:315)
>       at org.apache.hadoop.examples.Sort.run(Sort.java:166)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>       at org.apache.hadoop.examples.Sort.main(Sort.java:192)
> {noformat} 
> Right now, this initialization only happens in runNewMapper in 
> org.apache.hadoop.mapred.MapTask, but the sampling is performed before the 
> job started. TeraInputFormat class for the TeraSort has its own 
> writePartitionFile method. This is the javadoc comment of createRecordReader 
> method in InputFormat class:
> {noformat}
>    * Create a record reader for a given split. The framework will call
>    * {...@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before
>    * the split is used.
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2229) Initialize reader in Sort example

Reply via email to