So far, I've had no luck. Can anyone out there clarify the permissible characters/format for aws keys and bucket names?
I haven't looked at the code here, but it seems strange to me that the same restrictions on host/port etc apply given that it's a totally different system. I'd love to see exceptions thrown that are particular to the protocol/subsystem being employed. The s3 'handler' (or whatever, might be nice enough to check for format violations and throw and appropriate exception, for instance. It might URL-encode the secret key so that the user doesn't have to worry about this, or throw an exception notifying the user of a bad format. Currently, apparent problems with my s3 settings are throwing exceptions that give no indication that the problem is actually with those settings. My mitigating strategy has been to change my configuration to use "instance-local" storage (/mnt). I then copy the results out to s3 using 'distcp'. This is odd since distcp seems ok with my s3/aws info. I'm still unclear as to the permissible characters in bucket names and access keys. I gather '/' is bad in the secret key and that '_' is bad for bucket names. Thusfar i have only been able to get buckets to work in distcp that have only letters in their names, but I haven't tested to extensively. For example, I'd love to use buckets like: 'com.organization.hdfs.purpose'. This seems to fail. Using 'comorganizationhdfspurpose' works but clearly that is less than optimal. Like I say, I haven't dug into the source yet, but it is curious that distcp seems to work (at least where s3 is the destination) and hadoop fails when s3 is used as its storage. Anyone who has dealt with these issues, please post! It will help make the project better. -lincoln -- lincolnritter.com On Wed, Jul 9, 2008 at 7:10 AM, slitz <[EMAIL PROTECTED]> wrote: > I'm having the exact same problem, any tip? > > slitz > > On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter <[EMAIL PROTECTED]> > wrote: > >> Hello, >> >> I am trying to use S3 with Hadoop 0.17.0 on EC2. Using this style of >> configuration: >> >> <property> >> <name>fs.default.name</name> >> <value>s3://$HDFS_BUCKET</value> >> </property> >> >> <property> >> <name>fs.s3.awsAccessKeyId</name> >> <value>$AWS_ACCESS_KEY_ID</value> >> </property> >> >> <property> >> <name>fs.s3.awsSecretAccessKey</name> >> <value>$AWS_SECRET_ACCESS_KEY</value> >> </property> >> >> on startup of the cluster with the bucket having no non-alphabetic >> characters, I get: >> >> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode: >> java.lang.RuntimeException: Not a host:port pair: XXXXX >> at >> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121) >> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121) >> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178) >> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164) >> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848) >> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857) >> >> If I use this style of configuration: >> >> <property> >> <name>fs.default.name</name> >> <value>s3://$AWS_ACCESS_KEY:[EMAIL PROTECTED]</value> >> </property> >> >> I get (where the all-caps portions are the actual values...): >> >> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode: >> java.lang.NumberFormatException: For input string: >> "[EMAIL PROTECTED]" >> at >> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) >> at java.lang.Integer.parseInt(Integer.java:447) >> at java.lang.Integer.parseInt(Integer.java:497) >> at >> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128) >> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121) >> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178) >> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164) >> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848) >> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857) >> >> These exceptions are taken from the namenode log. The datanode logs >> show the same exceptions. >> >> Other than the above configuration changes, the configuration is >> identical to that generate by the hadoop image creation script found >> in the 0.17.0 distribution. >> >> Can anybody point me in the right direction here? >> >> -lincoln >> >> -- >> lincolnritter.com >> >
