[jira] [Commented] (HADOOP-10533) S3 input stream NPEs in MapReduce jon

Steve Loughran (JIRA) Wed, 23 Apr 2014 01:30:08 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977986#comment-13977986
 ]


Steve Loughran commented on HADOOP-10533:
-----------------------------------------

looking at this code, the only way the inner {{in}} stream can be null is if 
the {{store.retrieve(key)}} operation returned null. It does this if any 
exception is raised when trying to retrieve the data *that is not an instance 
of or subclass of IOException*, or the http response returned null

Either
# Something is going wrong with the GET, and that exception is being  logged at 
debug, then discarded.
# the HTTP request is returning a null response and this is not being picked up.

having the constructor and seek() operations check for null values and raise an 
IOE is the solution here; making close more robust wise, but does not address 
the real problem

> S3 input stream NPEs in MapReduce jon
> -------------------------------------
>
>                 Key: HADOOP-10533
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10533
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 1.0.0, 1.0.3
>         Environment: Hadoop with default configurations
>            Reporter: Benjamin Kim
>            Priority: Minor
>
> I'm running a wordcount MR as follows
> hadoop jar WordCount.jar wordcount.WordCountDriver 
> s3n://bucket/wordcount/input s3n://bucket/wordcount/output
>  
> s3n://bucket/wordcount/input is a s3 object that contains other input files.
> However I get following NPE error
> 12/10/02 18:56:23 INFO mapred.JobClient:  map 0% reduce 0%
> 12/10/02 18:56:54 INFO mapred.JobClient:  map 50% reduce 0%
>         12/10/02 18:56:56 INFO mapred.JobClient: Task Id : 
> attempt_201210021853_0001_m_000001_0, Status : FAILED
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
>         at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
>         at java.io.FilterInputStream.close(FilterInputStream.java:155)
>         at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
>         at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:144)
>         at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:497)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> MR runs fine if i specify more specific input path such as 
> s3n://bucket/wordcount/input/file.txt
> MR fails if I pass s3 folder as a parameter
> In summary,
> This works
>  hadoop jar ./hadoop-examples-1.0.3.jar wordcount 
> /user/hadoop/wordcount/input/ s3n://bucket/wordcount/output/
> This doesn't work
>  hadoop jar ./hadoop-examples-1.0.3.jar wordcount 
> s3n://bucket/wordcount/input/ s3n://bucket/wordcount/output/
> (both input path are directories)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10533) S3 input stream NPEs in MapReduce jon

Reply via email to