Thanks for the reply.
I've heard the "regenerate" suggestion before, but for organizations
who show aws all over the place this is a huge pain. I think it would
be better to come up with a more robust solution to handling aws info.
-lincoln
--
lincolnritter.com
On Wed, Jul 9, 2008 at 12:44 PM, Jimmy Lin <[EMAIL PROTECTED]> wrote:
> I've come across this problem before. My simple solution was to
> regenerate new keys until I got one without a slash... ;)
>
> -Jimmy
>
>> I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').
>>
>> With distcp, I found that using the URL format s3://ID:[EMAIL PROTECTED]/
>> did not work, even if I encoded the slash as "%2F". I got
>> "org.jets3t.service.S3ServiceException: S3 HEAD request failed.
>> ResponseCode=403, ResponseMessage=Forbidden"
>>
>> When I put the AWS Secret Key in hadoop-site.xml and wrote the URL as
>> s3://BUCKET/ it worked.
>>
>> I have periods ('.') in my bucket name, that was not a problem.
>>
>> What's weird is that org.apache.hadoop.fs.s3.Jets3tFileSystemStore
>> uses java.net.URI, which should take take of unencoding the %2F.
>>
>> -Stuart
>>
>>
>> On Wed, Jul 9, 2008 at 1:41 PM, Lincoln Ritter
>> <[EMAIL PROTECTED]> wrote:
>>> So far, I've had no luck.
>>>
>>> Can anyone out there clarify the permissible characters/format for aws
>>> keys and bucket names?
>>>
>>> I haven't looked at the code here, but it seems strange to me that the
>>> same restrictions on host/port etc apply given that it's a totally
>>> different system. I'd love to see exceptions thrown that are
>>> particular to the protocol/subsystem being employed. The s3 'handler'
>>> (or whatever, might be nice enough to check for format violations and
>>> throw and appropriate exception, for instance. It might URL-encode
>>> the secret key so that the user doesn't have to worry about this, or
>>> throw an exception notifying the user of a bad format. Currently,
>>> apparent problems with my s3 settings are throwing exceptions that
>>> give no indication that the problem is actually with those settings.
>>>
>>> My mitigating strategy has been to change my configuration to use
>>> "instance-local" storage (/mnt). I then copy the results out to s3
>>> using 'distcp'. This is odd since distcp seems ok with my s3/aws
>>> info.
>>>
>>> I'm still unclear as to the permissible characters in bucket names and
>>> access keys. I gather '/' is bad in the secret key and that '_' is
>>> bad for bucket names. Thusfar i have only been able to get buckets to
>>> work in distcp that have only letters in their names, but I haven't
>>> tested to extensively.
>>>
>>> For example, I'd love to use buckets like:
>>> 'com.organization.hdfs.purpose'. This seems to fail. Using
>>> 'comorganizationhdfspurpose' works but clearly that is less than
>>> optimal.
>>>
>>> Like I say, I haven't dug into the source yet, but it is curious that
>>> distcp seems to work (at least where s3 is the destination) and hadoop
>>> fails when s3 is used as its storage.
>>>
>>> Anyone who has dealt with these issues, please post! It will help
>>> make the project better.
>>>
>>> -lincoln
>>>
>>> --
>>> lincolnritter.com
>>>
>>>
>>>
>>> On Wed, Jul 9, 2008 at 7:10 AM, slitz <[EMAIL PROTECTED]> wrote:
>>>> I'm having the exact same problem, any tip?
>>>>
>>>> slitz
>>>>
>>>> On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter
>>>> <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to use S3 with Hadoop 0.17.0 on EC2. Using this style of
>>>>> configuration:
>>>>>
>>>>> <property>
>>>>> <name>fs.default.name</name>
>>>>> <value>s3://$HDFS_BUCKET</value>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>> <name>fs.s3.awsAccessKeyId</name>
>>>>> <value>$AWS_ACCESS_KEY_ID</value>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>> <name>fs.s3.awsSecretAccessKey</name>
>>>>> <value>$AWS_SECRET_ACCESS_KEY</value>
>>>>> </property>
>>>>>
>>>>> on startup of the cluster with the bucket having no non-alphabetic
>>>>> characters, I get:
>>>>>
>>>>> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
>>>>> java.lang.RuntimeException: Not a host:port pair: XXXXX
>>>>> at
>>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>>>>> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>> at
>>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>>
>>>>> If I use this style of configuration:
>>>>>
>>>>> <property>
>>>>> <name>fs.default.name</name>
>>>>> <value>s3://$AWS_ACCESS_KEY:[EMAIL PROTECTED]</value>
>>>>> </property>
>>>>>
>>>>> I get (where the all-caps portions are the actual values...):
>>>>>
>>>>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>>>>> java.lang.NumberFormatException: For input string:
>>>>> "[EMAIL PROTECTED]"
>>>>> at
>>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>>> at java.lang.Integer.parseInt(Integer.java:447)
>>>>> at java.lang.Integer.parseInt(Integer.java:497)
>>>>> at
>>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>>>> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>> at
>>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>>
>>>>> These exceptions are taken from the namenode log. The datanode logs
>>>>> show the same exceptions.
>>>>>
>>>>> Other than the above configuration changes, the configuration is
>>>>> identical to that generate by the hadoop image creation script found
>>>>> in the 0.17.0 distribution.
>>>>>
>>>>> Can anybody point me in the right direction here?
>>>>>
>>>>> -lincoln
>>>>>
>>>>> --
>>>>> lincolnritter.com
>>>>>
>>>>
>>>
>>
>>
>
>
>