Shawn Smith created CRUNCH-47:
---------------------------------
Summary: Inputs and outputs can't use non-default Hadoop FileSystem
Key: CRUNCH-47
URL: https://issues.apache.org/jira/browse/CRUNCH-47
Project: Crunch
Issue Type: Bug
Components: IO
Affects Versions: 0.3.0
Environment: Elastic MapReduce Hadoop 1.0.3
Reporter: Shawn Smith
I'm getting the following exception trying to use Crunch with Elastic MapReduce
where input and output files use the Native S3 FileSystem and intermediate
files use HDFS. HDFS is configured as the default file system:
Exception in thread "main" java.lang.IllegalArgumentException: This file system
object (hdfs://10.114.37.65:9000) does not support access to the request path
's3n://test-bucket/test/Input.avro' You possibly called FileSystem.get(conf)
when you should have called FileSystem.get(uri, conf) to obtain a file system
supporting your path.
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:767)
at
org.apache.crunch.io.SourceTargetHelper.getPathSize(SourceTargetHelper.java:44)
It looks like Crunch has a number of calls to FileSystem.get(Configuration)
that assume the default configured file system and fail with an S3 input or
output.
Also, CrunchJob.handleMultiPaths() calls FileSystem.rename() which works only
if the source and destination use the same file system. This breaks the final
upload of the output files from HDFS to S3.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira