Greetings, This question is inspired by the thread on the user list: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/[EMAIL PROTECTED]
Basically, there seems to be a lot of trouble using s3 as 'fs.default.name'. I'm trying (quoting for convenience): > <property> > <name>fs.default.name</name> > <value>s3://$HDFS_BUCKET</value> > </property> > > <property> > <name>fs.s3.awsAccessKeyId</name> > <value>$AWS_ACCESS_KEY_ID</value> > </property> > > <property> > <name>fs.s3.awsSecretAccessKey</name> > <value>$AWS_SECRET_ACCESS_KEY</value> > </property> > > on startup of the cluster with the bucket having no non-alphabetic > characters, I get: > > 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode: > java.lang.RuntimeException: Not a host:port pair: XXXXX > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121) > at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121) > at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178) > at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164) > at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848) > at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857) > > If I use this style of configuration: > > <property> > <name>fs.default.name</name> > <value>s3://$AWS_ACCESS_KEY:[EMAIL PROTECTED]</value> > </property> > > I get (where the all-caps portions are the actual values...): > > 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode: > java.lang.NumberFormatException: For input string: > "[EMAIL PROTECTED]" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Integer.parseInt(Integer.java:447) > at java.lang.Integer.parseInt(Integer.java:497) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128) > at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121) > at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178) > at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164) > at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848) > at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857) Now, I've gotten distcp to work, but can't get hadoop fired up using s3 as it's storage medium. I'm a neophyte when it comes to this codebase, but a look at the implementations of distcp (o.a.h.utils.CopyFiles) and, say, NameNode (o.a.h.dfs.NameNode) seems to indicate that the paths are handled very differently. Specifically (and I'm a bit out of my depth here), it looks like NameNode#initialize gets passed a String version of the authority portion of the 'fs.default.name' URI and tries to create a socket address with it. CopyFiles.setup asks for a FileSystem for the specified Path. CopyFiles makes sense to me - I can see how the FileSystem is created etc. NameNode doesn't make sense to me - shouldn't a FilesSystem be created from fs.default.name instead of "blindly" creating a socket? Is this a bug or am I completely off base here? If I'm off base, can someone give me an explanation of what I'm missing or point me in the right direction? If this seems like a "bug", what suggestions do you have for ways to address it. I'm happy to code it up, but, like I say, I'm new here ;-) . Any help is appreciated. -lincoln -- lincolnritter.com
