[ 
https://issues.apache.org/jira/browse/HADOOP-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709509#action_12709509
 ] 

Ian Nowland commented on HADOOP-5836:
-------------------------------------

The main fix here is to check for and just not return this empty file in 
listStatus(). However along with this, I broadened handling in all S3N methods 
for the different ways of designating directories in S3, in this way:
 
* A note about directories. S3 of course has no "native" support for them.
 * The idiom we choose then is: for any directory created by this class,
 * we use an empty object "#{dirpath}_$folder$" as a marker.
 * Further, to interoperate with other S3 tools, we also accept the following:
 * - an object "#{dirpath}/' denoting a directory marker
 * - if there exists any objects with the prefix "#{dirpath}/", then the
 *   directory is said to exist
 * - if both a file with the name of a directory and a marker for that
 *   directory exists, then the *file masks the directory*, and the directory
 *   is never returned.
 
In particular this meant fixing delete() and rename() to handle all three 
possible meanings of directory without failing.
 
This patch also includes the following:
-          Add logging any time a file in S3 is accessed for read or write, so 
when you get failure accessing/using a file its name will be in the task log
-         Fix when opening a file for reading which doesn't exist, change the 
behavior to immediately throw a FileNotFoundException, rather than returning a 
hard to debug NPE later when the file is closed.
-          Rewrite rename so that it only deletes the source files after every 
destination file has been written, so you never end up with half the files in 
each location
-         Set up retryer so rename automatically retries on S3 errors.


> Bug in S3N handling of directory markers using an object with a trailing "/" 
> causes jobs to fail
> ------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5836
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5836
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 0.18.3
>            Reporter: Ian Nowland
>
> Some tools which upload to S3 and use a object terminated with a "/" as a 
> directory marker, for instance "s3n://mybucket/mydir/". If asked to iterate 
> that "directory" via listStatus(), then the current code will return an empty 
> file "", which the InputFormatter happily assigns to a split, and which later 
> causes a task to fail, and probably the job to fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to