[jira] Updated: (HADOOP-1061) S3 listSubPaths bug

Tom White (JIRA) Sun, 11 Mar 2007 14:56:30 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tom White updated HADOOP-1061:
------------------------------

    Attachment: hadoop-1061-v3.patch

> It's not actually that hard to make it back-compatible

It's a bit harder than I thought when I wrote this, since we have changed the 
key format. So, you would have to check for both forms of the key before before 
saying the file or directory doesn't exist. I feel this would complicate the 
code somewhat.

I would suggest going with this new patch (v3) which additionally checks name 
and type. People will have to migrate old data as I originally described above. 
If this proved problematic, or there was demand, we could write a migration 
script. (the benefit of this is that it keeps the core S3FileSystem code 
unencumbered by version migration logic.)

> S3 listSubPaths bug
> -------------------
>
>                 Key: HADOOP-1061
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1061
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.11.2, 0.12.0
>            Reporter: Mike Smith
>            Priority: Critical
>         Attachments: 1061-hadoop.patch, hadoop-1061-v2.patch, 
> hadoop-1061-v3.patch
>
>
> I had problem with the -ls command in s3 file system. It was returning 
> inconsistence number of "Found Items" if you rerun it different times and 
> more importantly it returns recursive results (depth 1) for some folders. 
> I looked into the code, the problem is caused by jets3t library. The 
> inconsistency problem will be solved if we use :
> S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
> instead of 
> S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 
> 0);
> in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will 
> let GET REST request to have a "max-key" paramter with default value of 1000! 
> It seems s3 GET request is sensetive to this paramater! 
> But, the recursive problem is because the GET  request doesn't execute the 
> delimiter constraint correctly. The response contains all the keys with the 
> given prefix but they don't stop at the path_delimiter. You can simply test 
> this by making couple folder on hadoop s3 filesystem and run -ls. I followed 
> the generated GET request and it looks all fine but it is not executed 
> correctly at the s3 server side.I still don't know why the response doesn't 
> stop at the path_delimiter. 
> Possible casue: Jets3t library does URL encoding, why do we need to do URL 
> encoding in Jets3tFileSystemStore class!?
> example:
> Original path is   /user/root/folder  and it will be encoded to 
> %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will 
> reencode this to make the REST request. And it will be rewritten as 
> %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be 
> %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better 
> to skip the encoding part on Hadoop. This strange structure might be the 
> reason that the s3 doesn't stop at the path_delimiter. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1061) S3 listSubPaths bug

Reply via email to