[
https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tom White updated HADOOP-1061:
------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I've just committed this.
> S3 listSubPaths bug
> -------------------
>
> Key: HADOOP-1061
> URL: https://issues.apache.org/jira/browse/HADOOP-1061
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.11.2, 0.12.0
> Reporter: Mike Smith
> Priority: Critical
> Fix For: 0.13.0
>
> Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch,
> hadoop-1061-v3.patch, HADOOP-1061-v4.patch
>
>
> I had problem with the -ls command in s3 file system. It was returning
> inconsistence number of "Found Items" if you rerun it different times and
> more importantly it returns recursive results (depth 1) for some folders.
> I looked into the code, the problem is caused by jets3t library. The
> inconsistency problem will be solved if we use :
> S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
> instead of
> S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER ,
> 0);
> in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will
> let GET REST request to have a "max-key" paramter with default value of 1000!
> It seems s3 GET request is sensetive to this paramater!
> But, the recursive problem is because the GET request doesn't execute the
> delimiter constraint correctly. The response contains all the keys with the
> given prefix but they don't stop at the path_delimiter. You can simply test
> this by making couple folder on hadoop s3 filesystem and run -ls. I followed
> the generated GET request and it looks all fine but it is not executed
> correctly at the s3 server side.I still don't know why the response doesn't
> stop at the path_delimiter.
> Possible casue: Jets3t library does URL encoding, why do we need to do URL
> encoding in Jets3tFileSystemStore class!?
> example:
> Original path is /user/root/folder and it will be encoded to
> %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will
> reencode this to make the REST request. And it will be rewritten as
> %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be
> %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better
> to skip the encoding part on Hadoop. This strange structure might be the
> reason that the s3 doesn't stop at the path_delimiter.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.