[
https://issues.apache.org/jira/browse/FLINK-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641625#comment-16641625
]
ASF GitHub Bot commented on FLINK-10319:
----------------------------------------
tillrohrmann commented on issue #6680: [FLINK-10319] [runtime] Too many
requestPartitionState would crash JM
URL: https://github.com/apache/flink/pull/6680#issuecomment-427783408
I see the problem with very large jobs. Maybe we could solve it a bit
differently, by deploying tasks in topological order when using the `EAGER`
scheduling.
Concerning your answer to my second question: What if the producer partition
would get disposed (e.g. due to a failover which does not necessarily restart
the downstream operators). At the moment an upstream task failure will always
fail the downstream consumers. However, this can change in the future and the
more assumptions (e.g. downstream operators will be failed if upstream
operators fail) we bake in, the harder it gets to change this behaviour.
Moreover, I think it is always a good idea, to make the components as
self-contained as possible. This also entails that the failover behaviour
should ideally not depend on other things to happen. Therefore, I'm a bit
hesitant to change the existing behaviour.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Too many requestPartitionState would crash JM
> ---------------------------------------------
>
> Key: FLINK-10319
> URL: https://issues.apache.org/jira/browse/FLINK-10319
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination
> Affects Versions: 1.7.0
> Reporter: TisonKun
> Assignee: TisonKun
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.7.0
>
>
> Do not requestPartitionState from JM on partition request fail, which may
> generate too many RPC requests and block JM.
> We gain little benefit to check what state producer is in, which in the other
> hand crash JM by too many RPC requests. Task could always
> retriggerPartitionRequest from its InputGate, it would be fail if the
> producer has gone and succeed if the producer alive. Anyway, no need to ask
> for JM for help.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)