[
https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478174
]
Tom White commented on HADOOP-1061:
-----------------------------------
S3 files need to be written with a version number of the client library that
wrote them (as suggested in HADOOP-930). If we did this now, then we can detect
when there is a mismatch and fail fast (and informatively). While it would be
possible (but tricky) to support both versions, I don't feel that we should do
that since there is a workaround for data migration: copy your S3 data from the
old file system to a local file or HDFS (on EC2 preferably, but this isn't
necessary) using the old version of Hadoop, then copy it back to a new S3 file
system using a new version of Hadoop. I'd be happy to write this.
(I'm not saying that version-aware code will never be needed, just that it
isn't yet since not that many people are using this feature yet.)
Thoughts?
> S3 listSubPaths bug
> -------------------
>
> Key: HADOOP-1061
> URL: https://issues.apache.org/jira/browse/HADOOP-1061
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.11.2, 0.12.0
> Reporter: Mike Smith
> Priority: Critical
> Attachments: 1061-hadoop.patch
>
>
> I had problem with the -ls command in s3 file system. It was returning
> inconsistence number of "Found Items" if you rerun it different times and
> more importantly it returns recursive results (depth 1) for some folders.
> I looked into the code, the problem is caused by jets3t library. The
> inconsistency problem will be solved if we use :
> S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
> instead of
> S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER ,
> 0);
> in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will
> let GET REST request to have a "max-key" paramter with default value of 1000!
> It seems s3 GET request is sensetive to this paramater!
> But, the recursive problem is because the GET request doesn't execute the
> delimiter constraint correctly. The response contains all the keys with the
> given prefix but they don't stop at the path_delimiter. You can simply test
> this by making couple folder on hadoop s3 filesystem and run -ls. I followed
> the generated GET request and it looks all fine but it is not executed
> correctly at the s3 server side.I still don't know why the response doesn't
> stop at the path_delimiter.
> Possible casue: Jets3t library does URL encoding, why do we need to do URL
> encoding in Jets3tFileSystemStore class!?
> example:
> Original path is /user/root/folder and it will be encoded to
> %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will
> reencode this to make the REST request. And it will be rewritten as
> %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be
> %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better
> to skip the encoding part on Hadoop. This strange structure might be the
> reason that the s3 doesn't stop at the path_delimiter.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.