[ 
https://issues.apache.org/jira/browse/HADOOP-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524238
 ] 

Ahad Rana commented on HADOOP-1783:
-----------------------------------

Hi Tom,

I will try to produce some stack traces for you. But, ultimately, if you look 
at the DistributedFileSystem implementation of listPaths, it clearly creates 
fully qualified paths using the DfsPath(DFSFileInfo,FileSystem) constructor. In 
the case of the s3 implementation, the listPaths, as I mentioned, returns 
sub-paths without the scheme or the bucket name (authorization). If the default 
file system is not s3, then the hadoop library returns improper results by 
trying to resolve the returned sub-path against the default FileSystem ( since 
the scheme is missing from the path object).

I am working on enabling map-reduce functionality for scenarios where either 
both, or at least one file specification (map input, and reduce output)  in a 
map reduce spec points to the s3 file system. The above mentioned bug breaks 
the code in a couple of different places. When I implement keytoPath in 
Jet3FileSystemStore as follows, everything works. 

private Path keyToPath(String key) {
    return new Path("s3://"+bucket.getName()+key);
  }

Suffice it to say, there are other (performance related) issues that I am also 
looking at in order to enable satisfactory use of s3 as a potential 
input/output for a mapreduce job. But, by far, this bug is the most critically 
broken issue. 

Sorry about the lack of stack traces. I just need to recreate a proper test 
environment to get you these, and hopefully I will be able to submit something 
to you next week. 

Thanks,

Ahad.

> keyToPath in Jets3tFileSystemStore needs to return absolute path
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1783
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1783
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 0.1.0, 0.1.1, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 
> 0.5.0, 0.6.0, 0.6.1, 0.6.2, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.9.0, 0.9.1, 0.9.2, 
> 0.10.0, 0.10.1, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.12.1, 0.12.2, 0.12.3, 
> 0.13.0, 0.13.1, 0.14.0
>         Environment: hadoop 0.14.0 running under ec2 with s3 filesystem
>            Reporter: Ahad Rana
>
> The keyToPath method probably needs to:
> 1. take the bucket identifier as a parameter.
> 2. set the returned Path object's protocol plus authority (bucket). 
> Currently, APIs such as <i>listSubPaths</i> return relative paths (for a 
> directory listing). This in turn breaks map reduce operations if the default 
> file system is set to be something other than S3 (via fs.default.name, for 
> example). 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to