[GitHub] [samza] dxichen commented on pull request #1506: SAMZA-2303: Exclude side inputs when handling end-of-stream and watermarks for high-level

GitBox Wed, 30 Jun 2021 14:43:22 -0700


dxichen commented on pull request #1506:
URL: https://github.com/apache/samza/pull/1506#issuecomment-871746930



   > > > My initial thoughts are that if we think it is useful for the general 
TaskModel API to separate out the side inputs (for use by actual applications), 
then that could help the implementation here
   > > 
   > > 
   > > That is one of the benefits of having the divide exposed in the 
TaskModel. While that requires concrete use cases from the application 
perspective, the other purpose was to simplify a lot of flow within 
ContainerStorageManager and the newly introduced classes for state restoration 
which right now has needs to know which SSPs are side inputs vs changelog SSPs 
vs input SSPs. @dxichen do you have any inputs here to see if this would 
benefit/simplify the existing implementation?
   > > If we decide against it, then the changes looks good to me. It is useful 
to understand the long term path and potential clean up options if we want to 
remove this hack.
   > 
   > Do we need to put all of this information in the main `TaskModel` API 
layer though? I would imagine that the state restoration layer can have 
multiple implementations, so it could be helpful to expose this granular SSP 
information somewhere, but if we put the info in `TaskModel`, would that be 
overloading the scope of `TaskModel` too much? Maybe not, because this is all 
I/O that a task is dealing with. What do others think?
   
   The state restore changes would indirectly benefit from having the 
sideinputs SSPs separated for the main SSPs in the task model since 
ContainerStorageManager would be simplified if that where the case. Currently 
CSM also does much of the config parsing to determine the store names/SSPs of 
the sideinputs. This would be a good step to simplify the overloaded CSM 
implementation. Adding it to the task model maybe help the specific impls of 
restore manager layer with multiple implementations to decide whether they want 
to restore for side input topics, but currently that is not used in Blob or 
changelog restore.
   Note that all of this for parsing and side input handlers initiation is 
happening in the CSM constructor, meaning that the TaskInstance is not yet 
created at the time and therefore the TaskContext is not instantiated either. 
CSM could only use that TaskModel currently. In general I think this side input 
info could live in task model as well from the perspective of simplifying CSM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [samza] dxichen commented on pull request #1506: SAMZA-2303: Exclude side inputs when handling end-of-stream and watermarks for high-level

Reply via email to