[ 
https://issues.apache.org/jira/browse/HADOOP-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168618#comment-15168618
 ] 

Jagdish Kewat commented on HADOOP-12837:
----------------------------------------

Hi [~cnauroth],

I have a path filter utility which takes Path as input and returns true if the 
modification time of the give path is less than a specified time. Here's a 
method snippet for reference.
{code}
  @Override
  public boolean accept(Path path) {
    try {
      FileStatus fs = filesystem.getFileStatus(path);
      if (fs.getModificationTime() < this.date.getMillis()) {
        return true;
      }
    } catch (IOException e) {
      LOG.error(e.getMessage());
    }
    return false;
  }
{code}

The actual job takes all the paths for whom this returns true. Since the 
modification time for S3 based paths is returned as 0 this method returns true 
for all the paths specified. This results in processing unwanted data. This job 
doesn't fail. It just produces undesired output.

Besides I have a use case where we create a backup of the directories by 
renaming them with the timestamp of the modification time.
Also here the *filesystem* could be S3 or HDFS so need to find a generic 
solution.

A probably workaround I can think of is writing some dummy file like _SUCCESS 
in each of these directories and then look for modification time of the file, 
however, that would be an added effort.

Thanks,
Jagdish
 

> FileStatus.getModificationTime not working on S3
> ------------------------------------------------
>
>                 Key: HADOOP-12837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12837
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Jagdish Kewat
>
> Hi Team,
> We have observed an issue with the FileStatus.getModificationTime() API on S3 
> filesystem. The method always returns 0.
> I googled for this however couldn't find any solution as such which would fit 
> in my scheme of things. S3FileStatus seems to be an option however I would be 
> using this API on HDFS as well as S3 both so can't go for it.
> I tried to run the job on:
> * Release label:emr-4.2.0
> * Hadoop distribution:Amazon 2.6.0
> * Hadoop Common jar: hadoop-common-2.6.0.jar
> Please advise if any patch or fix available for this.
> Thanks,
> Jagdish



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to