Sorry for the time taken to respond, i've been doing some tests on this. Your workaround worked like a charm, thank you :) now i'm able to fetch the data from S3 process using HDFS and put the results in S3.
about the a) problem that i mentioned in my previous email, now i understood the error, i was starting namenode and datanodes and changing fs.default.name to s3://bucket/ after that, now i understand why it doesn't work. Thank you *very* much for your help, now i can use EC2 and S3 :) slitz On Fri, Jul 11, 2008 at 10:46 PM, Tom White <[EMAIL PROTECTED]> wrote: > On Fri, Jul 11, 2008 at 9:09 PM, slitz <[EMAIL PROTECTED]> wrote: > > a) Use S3 only, without HDFS and configuring fs.default.name as > s3://bucket > > -> PROBLEM: we are getting ERROR org.apache.hadoop.dfs.NameNode: > > java.lang.RuntimeException: Not a host:port pair: XXXXX > > What command are you using to start Hadoop? > > > b) Use HDFS as the default FS, specifying S3 only as input for the first > Job > > and output for the last(assuming one has multiple jobs on same data) > > -> PROBLEM: https://issues.apache.org/jira/browse/HADOOP-3733 > > Yes, this is a problem. I've added a comment to the Jira description > describing a workaround. > > Tom >
