[ 
https://issues.apache.org/jira/browse/FLINK-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087855#comment-17087855
 ] 

Piotr Nowojski commented on FLINK-17012:
----------------------------------------

{quote}
As a general thought: My take is that almost always (with very very few 
exceptions), "initialize()" or "configure()" methods should be seen as 
anti-patterns and be avoided if possible.
{quote}
Generally speaking I completely agree with this, but I'm not sure how feasible 
is any refactor towards this direction as part of this ticket. 
{quote}
but I think that it may not be good to do much IO and user code 
initialization(including state intialization and Function#open)
{quote}
I'm not sure if I understand what difference does it make if such code is 
executed from the constructor, or from an {{init()}} method called just after 
the constructor. Handling exception as [~sewen] pointed out is basically the 
same either way.

It might be tempting to just move 
{{org.apache.flink.streaming.runtime.tasks.StreamTask#beforeInvoke}} to the 
constructor? But it might require more refactoring to make it clean, as 
{{beforeInvoke()}} is calling an abstract {{StreamTask#init()}} method. Even 
ignoring the abstract method, having such long and complex call stacks from the 
constructor is asking yourself for troubles with hitting problems that 
{{StreamTask}}'s state will be inconsistent/not fully constructed in the middle 
of some method call. We have a similar problem right now, but not quite the 
same. It looks to me like {{StreamTask}} is trying to be it's own factory.

We could try to come up with some stop gap solution, converting `StreamTask` to 
a factory (and providing a factory for every of the `StreamTask's` subclasses). 
This `StreamTaskFactory` would still extend from `AbstractInvocable`, so we 
could try avoiding touching non streaming invocables, but still that's quite a 
bit of work, with only partial solutions :/ 

But that's a bit of off topic. Regardless of the way classes are constructed, 
there still could be a value for having an extra informative initialization 
state, that would be set in the middle of constructor. Whether we should add 
such step or not is kind of orthogonal. If there is no need for another state, 
we can also think about just adding some new mechanism for 
{{AbstractInvokable}} to report when it's actually running back to the 
{{Task}}, so that {{Task}} could postpone switching to {{RUNNING}} state for 
{{StreamTask}}.

> Expose stage of task initialization
> -----------------------------------
>
>                 Key: FLINK-17012
>                 URL: https://issues.apache.org/jira/browse/FLINK-17012
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics, Runtime / Task
>            Reporter: Wenlong Lyu
>            Priority: Major
>
> Currently a task switches to running before fully initialized, does not take 
> state initialization and operator initialization(#open ) in to account, which 
> may take long time to finish. As a result, there would be a weird phenomenon 
> that all tasks are running but throughput is 0. 
> I think it could be good if we can expose the initialization stage of tasks. 
> What to you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to