[ 
https://issues.apache.org/jira/browse/FLINK-13228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891795#comment-16891795
 ] 

Yu Li edited comment on FLINK-13228 at 7/24/19 12:08 PM:
---------------------------------------------------------

{noformat}
23:31:07,552 WARN org.apache.hadoop.hdfs.DataStreamer - DataStreamer Exception
java.nio.channels.ClosedByInterruptException at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:478)
{noformat}

After a closer check of the above log, I found the root cause is 
{{ClosedByInterruptException}} occurred after {{channel.write(buf)}} completes 
writing data in {{SocketOutputStream#performIO}}, and now I could stably 
reproduce the issue with the v2 hadoop patch as attached.

Since all data are already written successfully and file also marked as 
complete by NameNode, we should silently ignore the 
{{ClosedByInterruptedException}} instead of throwing it out as an error, which 
IMO is something hadoop should fix. Will file a JIRA for HDFS once find out a 
proper solution.

As per how to fix the issue here, since the issue is thrown at closing the 
{{RecoverableFsDataOutputStream}} (easily to confirm after flattening the 
try-with-resource to a normal try-catch), I think we could directly try-catch 
the exception and ignore it if failed to close the 
{{RecoverableFsDataOutputStream}}, because this is irrelative to the target of 
the test case (checking whether commit after normal close works). Wdyt? 
[~till.rohrmann] [~Zentol]

Will attach the draft patch here for a straight forward check.

 


was (Author: carp84):
{noformat}
23:31:07,552 WARN org.apache.hadoop.hdfs.DataStreamer - DataStreamer Exception 
java.nio.channels.ClosedByInterruptException at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:478)
{noformat}

After a closer check of the above log, I found the root cause is 
{{ClosedByInterruptException}} occurred after {{channel.write(buf)}} completes 
writing data in {{SocketOutputStream#performIO}}, and now I could stably 
reproduce the issue with the v2 hadoop patch as attached.

Since all data are already written successfully and file also marked as 
complete by NameNode, we should silently ignore the 
{{ClosedByInterruptedException}} instead of throwing it out as an error, which 
IMO is something hadoop should fix. Will file a JIRA for HDFS once find out a 
proper solution.

As per how to fix the issue here, since the issue is thrown at closing the 
{{RecoverableFsDataOutputStream}} (easily to confirm after flattening the 
try-with-resource to a normal try-catch), I think we could directly try-catch 
the exception and ignore it if failed to close the 
{{RecoverableFsDataOutputStream}}, because this is irrelative to the target of 
the test case (checking whether commit after normal close works). Wdyt? 
[~till.rohrmann] [~Zentol]

Will attach the draft patch here for a straight forward check.

 

> HadoopRecoverableWriterTest.testCommitAfterNormalClose fails on Travis
> ----------------------------------------------------------------------
>
>                 Key: FLINK-13228
>                 URL: https://issues.apache.org/jira/browse/FLINK-13228
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems
>    Affects Versions: 1.9.0
>            Reporter: Till Rohrmann
>            Assignee: Yu Li
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.9.0
>
>         Attachments: FLINK-13228.hadoop.debug.patch, 
> FLINK-13228.hadoop.debug.v2.patch
>
>
> {{HadoopRecoverableWriterTest.testCommitAfterNormalClose}} failed on Travis 
> with
> {code}
> HadoopRecoverableWriterTest.testCommitAfterNormalClose » IO The stream is 
> closed
> {code}
> https://api.travis-ci.org/v3/job/557293706/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to