[ 
https://issues.apache.org/jira/browse/FLINK-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641625#comment-16641625
 ] 

ASF GitHub Bot commented on FLINK-10319:
----------------------------------------

tillrohrmann commented on issue #6680: [FLINK-10319] [runtime] Too many 
requestPartitionState would crash JM
URL: https://github.com/apache/flink/pull/6680#issuecomment-427783408
 
 
   I see the problem with very large jobs. Maybe we could solve it a bit 
differently, by deploying tasks in topological order when using the `EAGER` 
scheduling.
   
   Concerning your answer to my second question: What if the producer partition 
would get disposed (e.g. due to a failover which does not necessarily restart 
the downstream operators). At the moment an upstream task failure will always 
fail the downstream consumers. However, this can change in the future and the 
more assumptions (e.g. downstream operators will be failed if upstream 
operators fail) we bake in, the harder it gets to change this behaviour. 
Moreover, I think it is always a good idea, to make the components as 
self-contained as possible. This also entails that the failover behaviour 
should ideally not depend on other things to happen. Therefore, I'm a bit 
hesitant to change the existing behaviour.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Too many requestPartitionState would crash JM
> ---------------------------------------------
>
>                 Key: FLINK-10319
>                 URL: https://issues.apache.org/jira/browse/FLINK-10319
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.7.0
>            Reporter: TisonKun
>            Assignee: TisonKun
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> Do not requestPartitionState from JM on partition request fail, which may 
> generate too many RPC requests and block JM.
> We gain little benefit to check what state producer is in, which in the other 
> hand crash JM by too many RPC requests. Task could always 
> retriggerPartitionRequest from its InputGate, it would be fail if the 
> producer has gone and succeed if the producer alive. Anyway, no need to ask 
> for JM for help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to