Hi all,
I'm trying to get Hadoop 0.14.0 to talk to S3 but its not working out
for me. My credentials are definitely right (checked multiple times
and that's not the issue in any case) and the bucket I'm querying
definitely does exist already and has keys under it.
However, when I try to access S3, I get really weird stuff going on.
If I try this:
bin/hadoop fs -ls key-with-stuff-under-it
it fails. I set it to DEBUG and I can see that the S3 library is
trying to query:
GET /BUCKET/%2Fuser%2Froot%2Fkey-with-stuff-under-it
which is wrong for obvious reasons, but which I can sort of
understand, as the default value of the workingDir instance variable
in S3FileSystem.java is "/user/${user.name}" (anybody know where this
default comes from, btw? it make no sense to me as is).
When I try the full URI:
bin/hadoop fs -ls s3://BUCKET/key-with-stuff-under-it
it still fails, instead doing this even stranger thing and trying to query
GET /BUCKET/%2Fkey-with-stuff-under-it
Note: the embedded %2F in there that makes it incorrect.
It also does this same thing whether I embed the credentials in the
full s3:// URI or not.
Then, I thought maybe it doesn't want a bucket at all in
fs.default.name, so I took that out and then the configuration failed
to parse on startup, throwing an IllegalArgumentException, telling me
it was expecting an authority portion to the URI it was given.
Are the docs on the wiki and Amazon's articles wrong about how to
configure hadoop-site.xml for S3 as a filesystem? Or am I doing
something wrong in my config, leaving something out, etc? Any
thoughts? Should I be setting the workingDir somewhere in the config
to something more sensible? I couldn't find much about this on the
mailing list or the forums at Amazon so I thought I'd just ask. My
config info is below.
Any help is greatly appreciated. Thanks :)
My mapred-default.xml is the default one from
hadoop-0.14.0/src/contrib/ec2/image/hadoop-init. I am running this on
EC2 with just MapReduce (no HDFS NameNodes/DataNodes running). Here's
my hadoop-site.xml (anonymized, of course):
------------------------------------
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/mnt/hadoop</value>
</property>
<property>
<name>fs.default.name</name>
<value>s3://BUCKET</value>
</property>
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>XXXXXXXXXXXXXXXXXX</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>XXXXXXXXXXXXXXXXXX</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>$MASTER_HOST:50002</value>
</property>
</configuration>
------------------------------------
--
Toby DiPasquale