[jira] Commented: (HADOOP-1061) S3 listSubPaths bug

Tom White (JIRA) Mon, 05 Mar 2007 14:01:45 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478174
 ]


Tom White commented on HADOOP-1061:
-----------------------------------

S3 files need to be written with a version number of the client library that 
wrote them (as suggested in HADOOP-930). If we did this now, then we can detect 
when there is a mismatch and fail fast (and informatively). While it would be 
possible (but tricky) to support both versions, I don't feel that we should do 
that since there is a workaround for data migration: copy your S3 data from the 
old file system to a local file or HDFS (on EC2 preferably, but this isn't 
necessary) using the old version of Hadoop, then copy it back to a new S3 file 
system using a new version of Hadoop. I'd be happy to write this.

(I'm not saying that version-aware code will never be needed, just that it 
isn't yet since not that many people are using this feature yet.)

Thoughts?

> S3 listSubPaths bug
> -------------------
>
>                 Key: HADOOP-1061
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1061
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.11.2, 0.12.0
>            Reporter: Mike Smith
>            Priority: Critical
>         Attachments: 1061-hadoop.patch
>
>
> I had problem with the -ls command in s3 file system. It was returning 
> inconsistence number of "Found Items" if you rerun it different times and 
> more importantly it returns recursive results (depth 1) for some folders. 
> I looked into the code, the problem is caused by jets3t library. The 
> inconsistency problem will be solved if we use :
> S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
> instead of 
> S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER , 
> 0);
> in listSubPaths of Jets3tFileSystemStore class (line 227)! This change will 
> let GET REST request to have a "max-key" paramter with default value of 1000! 
> It seems s3 GET request is sensetive to this paramater! 
> But, the recursive problem is because the GET  request doesn't execute the 
> delimiter constraint correctly. The response contains all the keys with the 
> given prefix but they don't stop at the path_delimiter. You can simply test 
> this by making couple folder on hadoop s3 filesystem and run -ls. I followed 
> the generated GET request and it looks all fine but it is not executed 
> correctly at the s3 server side.I still don't know why the response doesn't 
> stop at the path_delimiter. 
> Possible casue: Jets3t library does URL encoding, why do we need to do URL 
> encoding in Jets3tFileSystemStore class!?
> example:
> Original path is   /user/root/folder  and it will be encoded to 
> %2Fuser%2Froot%2Ffolder is Jets3tFileSystemStore class. Then, Jets3t will 
> reencode this to make the REST request. And it will be rewritten as 
> %252Fuser%252Froot%252Ffolder, so the the generated folder on the S3 will be 
> %2Fuser%2Froot%2Ffolder after decoding at the amazon side. Wouldn't be better 
> to skip the encoding part on Hadoop. This strange structure might be the 
> reason that the s3 doesn't stop at the path_delimiter. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1061) S3 listSubPaths bug

Reply via email to