Deepak Subhramanian created CRUNCH-220:
------------------------------------------
Summary: Crunch
Key: CRUNCH-220
URL: https://issues.apache.org/jira/browse/CRUNCH-220
Project: Crunch
Issue Type: Bug
Components: IO
Affects Versions: 0.6.0
Environment: Cloudera Hadoop with Amazon S3
Reporter: Deepak Subhramanian
Priority: Minor
I am trying to use crunch to read file from S3 and write to S3. I am able to
read the file .But giving an error while writing to s3. Not sure if it is a
bug or I am missing a hadoop configuration. I am able to read from s3 and
write to a local file or hdfs directly. Here is the code and error. I am
passing s3 key and secret as parameters.
PCollection<String> lines =pipeline.read(From.sequenceFile(inputdir,
Writables.strings()));
PCollection<String> textline = lines.parallelDo(new DoFn<String, String>() {
public void process(String line, Emitter<String> emitter) {
if (headerNotWritten) {
//emitter.emit("Writing Header");
emitter.emit(table_header.getTable_header());
emitter.emit(line);
headerNotWritten =false;
}else {
emitter.emit(line);
}
}
}, Writables.strings()); // Indicates the serialization format
pipeline.writeTextFile(textline, outputdir);
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
s3n://bktname/testcsv, expected: hdfs://ip-address.compute.internal
[ip-addresscompute.amazonaws.com] out: at
org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:410)
[ip-address-82.eu-west-1.compute.amazonaws.com] out: at
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
[ip-address-82.eu-west-1.compute.amazonaws.com] out: at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
[ip-address-82.eu-west-1.compute.amazonaws.com] out: at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
[ip-address-82.eu-west-1.compute.amazonaws.com] out: at
org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:797)
[ip-address-82.eu-west-1.compute.amazonaws.com] out: at
org.apache.crunch.io.impl.FileTargetImpl.handleExisting(FileTargetImpl.java:133)
[ip-address-82.eu-west-1.compute.amazonaws.com] out: at
org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:212)
[ip-address-82.eu-west-1.compute.amazonaws.com] out: at
org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:200)
[ip-address-82.eu-west-1.compute.amazonaws.com] out: at
org.apache.crunch.impl.mr.collect.PCollectionImpl.write(PCollectionImpl.java:132)
[ec2-79-125-102-82.eu-west-1.compute.amazonaws.com] out: at
org.apache.crunch.impl.mr.MRPipeline.writeTextFile(MRPipeline.java:356)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira