[ http://issues.apache.org/jira/browse/HADOOP-574?page=comments#action_12451572 ] Tom White commented on HADOOP-574: ----------------------------------
I've been looking at Jim Kellerman's suggestion of using a URL-safe Base64 encoding for path names since they are more compact than regular URL encoding. (An S3 key is a maximum of 1024 bytes.) See http://www.faqs.org/rfcs/rfc3548.html and http://www.faqs.org/qa/rfcc-1940.html for details. Jim has implemented these algorithms in a public domain Base64 encoding package (http://iharder.sourceforge.net/current/java/base64/, version 2.2). However, I don't think Base 64 encoding is compatible with the delimiter request parameter for S3 bucket listing since base 64 encoding doesn't preserve byte boundaries, so it is not possible to search for a substring in the base 64 encoded representation of some text. The current S3 FileSystem code uses the delimiter request parameter as an efficient way to implement the listPathsRaw method. Therefore I think it is probably best to stick with the current URL encoding solution. > want FileSystem implementation for Amazon S3 > -------------------------------------------- > > Key: HADOOP-574 > URL: http://issues.apache.org/jira/browse/HADOOP-574 > Project: Hadoop > Issue Type: New Feature > Components: fs > Affects Versions: 0.9.0 > Reporter: Doug Cutting > Attachments: dependencies.zip, HADOOP-574.patch > > > An S3-based Hadoop FileSystem would make a great addition to Hadoop. > It would facillitate use of Hadoop on Amazon's EC2 computing grid, as > discussed here: > http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00318.html > This is related to HADOOP-571, which would make Hadoop's FileSystem > considerably easier to extend. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira