[ 
https://issues.apache.org/jira/browse/FLINK-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863863#comment-15863863
 ] 

Nico Kruber commented on FLINK-5553:
------------------------------------

If you try again with the PR above, there should now be a more meaningful 
exception that should point you to the original exception.

>From the log, I'd also guess that this is due insufficient network buffers 
>though.

> Job fails during deployment with IllegalStateException from subpartition 
> request
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-5553
>                 URL: https://issues.apache.org/jira/browse/FLINK-5553
>             Project: Flink
>          Issue Type: Bug
>          Components: Network
>    Affects Versions: 1.3.0
>            Reporter: Robert Metzger
>            Assignee: Nico Kruber
>         Attachments: application-1484132267957-0076
>
>
> While running a test job with Flink 1.3-SNAPSHOT 
> (6fb6967b9f9a31f034bd09fcf76aaf147bc8e9a0) the job failed with this exception:
> {code}
> 2017-01-18 14:56:27,043 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed 
> (9/10) (befc06d0e792c2ce39dde74b365dd3cf) switched from DEPLOYING to RUNNING.
> 2017-01-18 14:56:27,059 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Flat Map 
> (9/10) (e94a01ec283e5dce7f79b02cf51654c4) switched from DEPLOYING to RUNNING.
> 2017-01-18 14:56:27,817 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Flat Map 
> (10/10) (cbb61c9a2f72c282877eb383e111f7cd) switched from RUNNING to FAILED.
> java.lang.IllegalStateException: There has been an error in the channel.
>         at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
>         at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77)
>         at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104)
>         at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115)
>         at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419)
>         at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441)
>         at 
> org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153)
>         at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192)
>         at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-01-18 14:56:27,819 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job 
> Misbehaved Job (b1d985d11984df57400fdff2bb656c59) switched from state RUNNING 
> to FAILING.
> java.lang.IllegalStateException: There has been an error in the channel.
>         at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
>         at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77)
>         at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104)
>         at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115)
>         at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419)
>         at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441)
>         at 
> org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153)
>         at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192)
>         at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>         at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> This is the first exception that is reported to the jobmanager.
> I think this is related to missing network buffers. You see that from the 
> next deployment after the restart, where the deployment fails with the 
> insufficient number of buffers exception.
> I'll add logs to the JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to