[
https://issues.apache.org/jira/browse/FLINK-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863839#comment-15863839
]
ASF GitHub Bot commented on FLINK-5553:
---------------------------------------
GitHub user NicoK opened a pull request:
https://github.com/apache/flink/pull/3299
[FLINK-5553] keep the original throwable in PartitionRequestClientHandler
This way, when checking for a previous error in any input channel, we can
throw a meaningful exception instead of the inspecific
`IllegalStateException("There has been an error in the channel.")` before.
Note that the original `Throwable` (from an existing channel) may or may
not(!) have been printed by the `InputGate` yet. Any new input channel,
however, did not get the `Throwable` and must fail through the (now enhanced)
fallback mechanism.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NicoK/flink flink-5553
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3299.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3299
----
commit a722d7cd4c4218543c87c2e8a3b3bbc708bddf55
Author: Nico Kruber <[email protected]>
Date: 2017-02-13T15:30:59Z
[FLINK-5553] keep the original throwable in PartitionRequestClientHandler
This way, when checking for a previous error in any input channel, we can
throw
a meaningful exception instead of the inspecific
IllegalStateException("There has been an error in the channel.") before.
Note that the original throwable (from an existing channel) may or may
not(!)
have been printed by the InputGate yet. Any new input channel, however, did
not
get the Throwable and must fail through the (now enhanced) fallback
mechanism.
----
> Job fails during deployment with IllegalStateException from subpartition
> request
> --------------------------------------------------------------------------------
>
> Key: FLINK-5553
> URL: https://issues.apache.org/jira/browse/FLINK-5553
> Project: Flink
> Issue Type: Bug
> Components: Network
> Affects Versions: 1.3.0
> Reporter: Robert Metzger
> Attachments: application-1484132267957-0076
>
>
> While running a test job with Flink 1.3-SNAPSHOT
> (6fb6967b9f9a31f034bd09fcf76aaf147bc8e9a0) the job failed with this exception:
> {code}
> 2017-01-18 14:56:27,043 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed
> (9/10) (befc06d0e792c2ce39dde74b365dd3cf) switched from DEPLOYING to RUNNING.
> 2017-01-18 14:56:27,059 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map
> (9/10) (e94a01ec283e5dce7f79b02cf51654c4) switched from DEPLOYING to RUNNING.
> 2017-01-18 14:56:27,817 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map
> (10/10) (cbb61c9a2f72c282877eb383e111f7cd) switched from RUNNING to FAILED.
> java.lang.IllegalStateException: There has been an error in the channel.
> at
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104)
> at
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115)
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419)
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441)
> at
> org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153)
> at
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192)
> at
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-18 14:56:27,819 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Job
> Misbehaved Job (b1d985d11984df57400fdff2bb656c59) switched from state RUNNING
> to FAILING.
> java.lang.IllegalStateException: There has been an error in the channel.
> at
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104)
> at
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115)
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419)
> at
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441)
> at
> org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153)
> at
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192)
> at
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> This is the first exception that is reported to the jobmanager.
> I think this is related to missing network buffers. You see that from the
> next deployment after the restart, where the deployment fails with the
> insufficient number of buffers exception.
> I'll add logs to the JIRA.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)