dxichen commented on pull request #1506: URL: https://github.com/apache/samza/pull/1506#issuecomment-871746930
> > > My initial thoughts are that if we think it is useful for the general TaskModel API to separate out the side inputs (for use by actual applications), then that could help the implementation here > > > > > > That is one of the benefits of having the divide exposed in the TaskModel. While that requires concrete use cases from the application perspective, the other purpose was to simplify a lot of flow within ContainerStorageManager and the newly introduced classes for state restoration which right now has needs to know which SSPs are side inputs vs changelog SSPs vs input SSPs. @dxichen do you have any inputs here to see if this would benefit/simplify the existing implementation? > > If we decide against it, then the changes looks good to me. It is useful to understand the long term path and potential clean up options if we want to remove this hack. > > Do we need to put all of this information in the main `TaskModel` API layer though? I would imagine that the state restoration layer can have multiple implementations, so it could be helpful to expose this granular SSP information somewhere, but if we put the info in `TaskModel`, would that be overloading the scope of `TaskModel` too much? Maybe not, because this is all I/O that a task is dealing with. What do others think? The state restore changes would indirectly benefit from having the sideinputs SSPs separated for the main SSPs in the task model since ContainerStorageManager would be simplified if that where the case. Currently CSM also does much of the config parsing to determine the store names/SSPs of the sideinputs. This would be a good step to simplify the overloaded CSM implementation. Adding it to the task model maybe help the specific impls of restore manager layer with multiple implementations to decide whether they want to restore for side input topics, but currently that is not used in Blob or changelog restore. Note that all of this for parsing and side input handlers initiation is happening in the CSM constructor, meaning that the TaskInstance is not yet created at the time and therefore the TaskContext is not instantiated either. CSM could only use that TaskModel currently. In general I think this side input info could live in task model as well from the perspective of simplifying CSM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
