[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510203#comment-16510203
 ] 

Arun Suresh edited comment on MAPREDUCE-7101 at 6/12/18 9:04 PM:
-----------------------------------------------------------------

Thanks [~tmarquardt].
The patch looks good to me. +1  The comment describing the new field in 
JHAdminConfig is wrong - minor thing I can fix before committing.

Will wait till EOD before committing if anyone has issues with the patch.

Given that this patch retains the default behavior, and specific cloud 
deployments can choose to always scan.
Maybe a pluggable FS specific scan is probably a better long term solution, but 
I agree with [~leftnoteasy] and [~rohithsharma] that we should go ahead with 
the approach in this patch to unblock.





was (Author: asuresh):
Thanks [~tmarquardt].
The patch looks good to me. +1  The comment describing the new field in 
JHAdminConfig is wrong - minor thing I can fix before committing.

Given that this patch retains the default behavior, and specific cloud 
deployments can choose to always scan.
Maybe a pluggable FS specific scan is probably a better long term solution, but 
I agree with [~leftnoteasy] and [~rohithsharma] that we should go ahead with 
the approach in this patch to unblock.




> Revisit behavior of JHS scan file behavior
> ------------------------------------------
>
>                 Key: MAPREDUCE-7101
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Assignee: Thomas Marquardt
>            Priority: Critical
>         Attachments: MAPREDUCE-7101.001.patch
>
>
> Currently, the JHS scan directory if the modification of *directory* changed: 
> {code} 
>     public synchronized void scanIfNeeded(FileStatus fs) {
>       long newModTime = fs.getModificationTime();
>       if (modTime != newModTime) {
>         <... omitted some logics ...>
>         // reset scanTime before scanning happens
>         scanTime = System.currentTimeMillis();
>         Path p = fs.getPath();
>         try {
>           scanIntermediateDirectory(p);
> {code}
> This logic relies on an assumption that, the directory's modification time 
> will be updated if a file got placed under the directory.
> However, the semantic of directory's modification time is not consistent in 
> different FS implementations. For example, MAPREDUCE-6680 fixed some issues 
> of truncated modification time. And HADOOP-12837 mentioned on S3, the 
> directory's modification time is always 0.
> I think we need to revisit behavior of this logic to make it to more robustly 
> work on different file systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to