[
https://issues.apache.org/jira/browse/HADOOP-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168618#comment-15168618
]
Jagdish Kewat commented on HADOOP-12837:
----------------------------------------
Hi [~cnauroth],
I have a path filter utility which takes Path as input and returns true if the
modification time of the give path is less than a specified time. Here's a
method snippet for reference.
{code}
@Override
public boolean accept(Path path) {
try {
FileStatus fs = filesystem.getFileStatus(path);
if (fs.getModificationTime() < this.date.getMillis()) {
return true;
}
} catch (IOException e) {
LOG.error(e.getMessage());
}
return false;
}
{code}
The actual job takes all the paths for whom this returns true. Since the
modification time for S3 based paths is returned as 0 this method returns true
for all the paths specified. This results in processing unwanted data. This job
doesn't fail. It just produces undesired output.
Besides I have a use case where we create a backup of the directories by
renaming them with the timestamp of the modification time.
Also here the *filesystem* could be S3 or HDFS so need to find a generic
solution.
A probably workaround I can think of is writing some dummy file like _SUCCESS
in each of these directories and then look for modification time of the file,
however, that would be an added effort.
Thanks,
Jagdish
> FileStatus.getModificationTime not working on S3
> ------------------------------------------------
>
> Key: HADOOP-12837
> URL: https://issues.apache.org/jira/browse/HADOOP-12837
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Reporter: Jagdish Kewat
>
> Hi Team,
> We have observed an issue with the FileStatus.getModificationTime() API on S3
> filesystem. The method always returns 0.
> I googled for this however couldn't find any solution as such which would fit
> in my scheme of things. S3FileStatus seems to be an option however I would be
> using this API on HDFS as well as S3 both so can't go for it.
> I tried to run the job on:
> * Release label:emr-4.2.0
> * Hadoop distribution:Amazon 2.6.0
> * Hadoop Common jar: hadoop-common-2.6.0.jar
> Please advise if any patch or fix available for this.
> Thanks,
> Jagdish
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)