[
https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711643#action_12711643
]
Tom White commented on HADOOP-5805:
-----------------------------------
This looks like a good fix. The test should do an assert to check that it gets
back an appropriate FileStatus object.
The patch needs to be regenerated since the tests have moved from src/test to
src/test/core.
For the second problem, you could subclass your output format to override
checkOutputSpecs() so it doesn't throw FileAlreadyExistsException. But I agree
it would be nicer to deal with this generally. Perhaps open a separate Jira as
it would affect more than NativeS3FileSystem.
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the
> following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute:
> s3n://infocloud-output
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at
> com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir
> s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.