cameronlee314 commented on pull request #1506:
URL: https://github.com/apache/samza/pull/1506#issuecomment-869951777


   > 1. Would it better to have a separation within TaskModel instead of 
conflating SSPs into both inputs & side inputs? Doing so, will eliminate yet 
another hack addition to `TaskContextImpl` and potentially solve scenarios that 
require this divide information upstream. Checking if you have evaluated this 
option and what the initial thoughts are.
   
   I didn't really consider changing the `TaskModel`. My initial thoughts are 
that if we think it is useful for the general `TaskModel` API to separate out 
the side inputs (for use by actual applications), then that could help the 
implementation here. However, if this won't be helpful for the general API, 
then I don't think we should bloat/modify the API just to fix this issue (for 
backwards compat, we probably would need to add a new field and not modify the 
current one). I'm not currently convinced that this is useful enough for the 
general API, so I don't think that we should change the general API for this 
bug fix. What do you think?
   I think the root reason for why we need this `TaskContextImpl` hack is 
because we don't have a clean way to wire in internal components for the 
high-level implementation (e.g. we also need the `TaskContextImpl` hack to wire 
through `StreamMetadataCache` into high-level). I don't see a clean way to fix 
this root reason right now though.
   
   > 3. Looks like fix for SAMZA-2300 is going in as part of this PR. Why is 
that? Can we separate it into another PR to keep the scope of the PR to single 
issue?
   
   Answering #3 first: In order to test the side inputs fix (i.e. 
`TestLocalTableWithSideInputsEndToEnd`), I needed to modify existing tests to 
use multiple partitions. The current tests (only single partition) don't rely 
on the end-of-stream propagation. This is why we didn't recognize this bug with 
the current tests. However, when I switch to using multiple partitions, then 
the bug in SAMZA-2300 gets triggered. So it was not ideal to split them up. I 
also don't think it is ideal to fix both tickets in the same commit (since it 
makes the single commit more complex), but I felt that the benefit towards 
having more complete tests for the side inputs fix (by merging into a single 
commit) was better. 
   
   > 2. Can we separate out the tests into two categories (one that needs to 
belong as part of the fix vs one that isn't) and have a separate PR for the 
latter category of tests?
   
   I believe the only test changes that aren't as directly related to the side 
inputs fix is `TestLocalTableEndToEnd`. However, it does help add confidence 
that the SAMZA-2300 change is working as expected, so that is why I included 
the changes to `TestLocalTableEndToEnd` here. I do have another set of changes 
for migrating other tests which I did keep separate (I will post a PR for those 
when this one is done), but I wanted to at least have this one test in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to