[
https://issues.apache.org/jira/browse/PIG-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311391#comment-15311391
]
Chris Nauroth commented on PIG-4916:
------------------------------------
Hello [~daijy]. Thank you for the patch. +1 (non-binding) from me. I agree
that it isn't feasible to write a unit test for this. We have confirmation
from your manual testing that it worked though.
> Pig on Tez fail to remove temporary HDFS files in some cases
> ------------------------------------------------------------
>
> Key: PIG-4916
> URL: https://issues.apache.org/jira/browse/PIG-4916
> Project: Pig
> Issue Type: Bug
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.17.0, 0.16.1
>
> Attachments: PIG-4916-1.patch
>
>
> We saw the following stack trace when running Pig on S3:
> {code}
> 2016-06-01 22:04:22,714 [Thread-19] INFO
> org.apache.hadoop.service.AbstractService - Service
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl failed in state
> STOPPED; cause: java.io.IOException: Filesystem closed
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
> at
> org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
> at
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
> at
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
> at
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
> at org.apache.tez.client.TezClient.stop(TezClient.java:582)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
> 2016-06-01 22:04:22,718 [Thread-19] ERROR
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Error
> shutting down Tez session org.apache.tez.client.TezClient@48bf833a
> org.apache.hadoop.service.ServiceStateException: java.io.IOException:
> Filesystem closed
> at
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:225)
> at
> org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
> at org.apache.tez.client.TezClient.stop(TezClient.java:582)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
> at
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
> at
> org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
> at
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
> at
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
> at
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> ... 4 more
> {code}
> The job run successfully, but the temporary hdfs files are not removed.
> [~cnauroth] points out FileSystem also use shutdown hook to close FileSystem
> instances and it might run before Pig's shutdown hook in Main. By switching
> to Hadoop's ShutdownHookManager, we can put an order on shutdown hook.
> This has been verified by testing the following code in Main:
> {code}
> ShutdownHookManager.get().addShutdownHook(new Runnable() {
> @Override
> public void run() {
> FileLocalizer.deleteTempResourceFiles();
> }
> }, priority);
> {code}
> Notice FileSystem.SHUTDOWN_HOOK_PRIORITY=10. When priority=9, Pig fail. When
> priority=11, Pig success.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)