pnowojski edited a comment on issue #11403: [FLINK-16316][operators] Implement 
new StreamOperatorBase as a replacement for AbstractStreamOperator
URL: https://github.com/apache/flink/pull/11403#issuecomment-600149707
 
 
   Thanks @rkhachatryan for writing your thoughts down. Responding to your 
overall comment (without inline comments).
   
   > To my understanding, the problem of setup() method is non-final variables 
(and therefore unneseccary "lifecycle" in contract).
   > They exists for two reasons:
   >
   > 1. too many responsibilites
   > 2. it was too difficult to add a proper constructor to all hierarchy
   The former (responsibilites), I think, is much more important and should be 
addressed first. It's the reason for 2nd and has other implications.
   
   I don't see it like that. Regardless of the responsibilities of the 
`AbstractStreamOperator` (and whether there should be fewer of them), the 
`#setup()` shouldn't existed. It's an artefact of mostly broken construction of 
the operators on the client side, where there is no access to the runtime 
things, and then serialising, distributing them to TMs, deserialising. Whether 
`AbstractStreamOperator` should have fewer (or not) responsibilities, is 
completely orthogonal problem.
   
   > And, looking from a higher level perspective, the StreamOperator interface 
which it implements is also broken (IMO):
   > (...)
   > With better interfaces we wouldn't worry about having misplaces/unused 
fields, and therefore about their initialization.
   
   Maybe some of them would disappear, maybe others wouldn't. Hard to say for 
me.
   
   > Both efforts could be done incrementally (StreamOperator could be split by 
incrementally pulling it's methods into new interfaces). Interfaces are 
@PublicEvolving.
   
   I honestly do not even know how all of this should look like if we could 
start from scratch, so it's hard for me to see incremental path to solve those 
problems. There are couple of them from my perspective:
   1. it feels to me like `StreamOperatorStateHandler` should be not a field 
inside `StreamOperatorBase`, but a wrapper around it, but...
   2. `AbstractStreamOperator` (and hence I assume `StreamOperatorBase` as 
well), needs to expose things from within `StreamOperatorStateHandler`. 
   3. Purpose of `AbstractStreamOperator` is to provide a convenient class, so 
that users do not have to implement all of the methods manually.
   
   putting those three together, it's hard for me to imagine a solution, where 
we split  `StreamOperatorBase` into smaller interfaces, without providing some 
abstract classes that are inter connected between one another in some kind of 
spaghetti net. More over:
   
   4. Ideally we would like to provide `OneInput` and `TwoInput` abstract 
versions of the `StreamOperatorBase` - maybe this requirement can disappear, if 
we force everyone to implement `MultipleInput` interface, even if operator has 
a single input.
   5. There is also a need for more specialised base class: 
`AbstractUdfOperator`
   
   This means, without multiple inheritance it's impossible to provide abstract 
classes for combination of UDF/non-udf vs OneInput/TwoInput/MultipleInput, 
without hitting the problem of `AbstractStreamOperator#processWatermark1` 
(indirect/implicit overriding of method that's not defined in 
`AbstractStreamOperator`).
   
   If you would like to split it even further to things like keyed and 
not-keyed operators, those two problems (spaghetti/web of connections and 
multiple inheritance) are going to be more profound.
   
   I don't know at the moment how to tackle those problems, but for me they are 
out of scope of this PR/JIRA/FLIP :/ My purpose here was to solve some of the 
problems that I was aware before (non final transient methods, and 
incompatibility with `MultipleInputStreamOperator` interface). But the things 
we are discussing here, would have to include much broader audience and have 
its own FLIP/voting and at the moment I'm not sure if community would finally 
agree on splitting the abstract class/interfaces because of some other trade 
offs.
   
   I would like to avoid blocking this N-ary FLIP with those discussions.
   
   > Having said that, I think that grouping state operations and bundling init 
parameters into a container are steps in the right direction.
   
   Does it mean that you are generally speaking fine with proceeding with the 
PR as it is? I mean the general concept of `StreamOperatorInitializer`, and 
`StreamOperatorStateHandler`(as field of `AbstractStreamOperator` and 
`StreamOperatorBase`)? Of course % the inline comments which I haven't got time 
to read/address yet?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to