[jira] Commented: (HADOOP-5805) problem using top level s3 buckets as input/output directories

Tom White (JIRA) Thu, 21 May 2009 08:07:10 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711643#action_12711643
 ]


Tom White commented on HADOOP-5805:
-----------------------------------

This looks like a good fix. The test should do an assert to check that it gets 
back an appropriate FileStatus object.

The patch needs to be regenerated since the tests have moved from src/test to 
src/test/core.

For the second problem, you could subclass your output format to override 
checkOutputSpecs() so it doesn't throw FileAlreadyExistsException. But I agree 
it would be nicer to deal with this generally. Perhaps open a separate Jira as 
it would affect more than NativeS3FileSystem.  



> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
>                 Key: HADOOP-5805
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5805
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 0.18.3
>         Environment: ec2, cloudera AMI, 20 nodes
>            Reporter: Arun Jacob
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5805-0.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the 
> following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: 
> s3n://infocloud-output
>         at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
>         at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
>         at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
>         at 
> com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at 
> com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>  
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir  
> s3n://infocloud-output/output-subdir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5805) problem using top level s3 buckets as input/output directories

Reply via email to