[ 
https://issues.apache.org/jira/browse/PIG-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317236#comment-15317236
 ] 

Daniel Dai commented on PIG-4916:
---------------------------------

Good catch on Hadoop 1 compilation failure. In that sense, I'd only put the fix 
on 0.17 where we drop Hadoop 1.x support. I don't want to put a short lived 
code and make the process complex.

TezSessionManager.shutdown failure is due to the FileSystem closure and should 
be fixed. What's the benefit to switch all shutdown hooks to use the same? Code 
consistency?

> Pig on Tez fail to remove temporary HDFS files in some cases
> ------------------------------------------------------------
>
>                 Key: PIG-4916
>                 URL: https://issues.apache.org/jira/browse/PIG-4916
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.17.0
>
>         Attachments: PIG-4916-1.patch
>
>
> We saw the following stack trace when running Pig on S3:
> {code}
> 2016-06-01 22:04:22,714 [Thread-19] INFO  
> org.apache.hadoop.service.AbstractService - Service 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl failed in state 
> STOPPED; cause: java.io.IOException: Filesystem closed
> java.io.IOException: Filesystem closed
>       at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>       at 
> org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
>       at org.apache.tez.client.TezClient.stop(TezClient.java:582)
>       at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
>       at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
> 2016-06-01 22:04:22,718 [Thread-19] ERROR 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Error 
> shutting down Tez session org.apache.tez.client.TezClient@48bf833a
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Filesystem closed
>       at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:225)
>       at 
> org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
>       at org.apache.tez.client.TezClient.stop(TezClient.java:582)
>       at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
>       at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
> Caused by: java.io.IOException: Filesystem closed
>       at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>       ... 4 more
> {code}
> The job run successfully, but the temporary hdfs files are not removed.
> [~cnauroth] points out FileSystem also use shutdown hook to close FileSystem 
> instances and it might run before Pig's shutdown hook in Main. By switching 
> to Hadoop's ShutdownHookManager, we can put an order on shutdown hook.
> This has been verified by testing the following code in Main:
> {code}
>         ShutdownHookManager.get().addShutdownHook(new Runnable() {
>             @Override
>             public void run() {
>                 FileLocalizer.deleteTempResourceFiles();
>             }
>         }, priority);
> {code}
> Notice FileSystem.SHUTDOWN_HOOK_PRIORITY=10. When priority=9, Pig fail. When 
> priority=11, Pig success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to