Thanks a lot for the comments. I updated the SEP with more details and clarification. Please let me know if you have further questions.
Thanks, Xinyu On Thu, May 25, 2017 at 11:19 AM, Prateek Maheshwari < pmaheshw...@linkedin.com.invalid> wrote: > Hi Xinyu, > > Thanks for the proposal. Some requests for clarifications. Let's update the > SEP directly instead of replying here. > > E.g., in "For any following intermediate stream whose input streams are all > end-of-stream, it will be marked as pending EOS" - Should clarify that > (IIUC) something is injecting EOS messages in all intermediate stream > partitions once it receives EOS from all input stream partitions it's > consuming. Should also clarify what is that something. > Same for "declare end of stream once all the EOS messages have been > received." - What does this declaration involve and who is doing this? > > In pro for approach 2: Not clear what this means - "The watermark can > conclude the input messages before this watermark have been complete." > > For the cons of approach 2: "Complicated failure scenario of the second > job. It needs to checkpoint all the watermark messages received, so when it > recovered from failure, it can still count." - How is this related to EOS? > How is this related to the checkpoint watermark section? > Also, what is the "more messages required to write.. " referring to? > > "Samza needs to reconcile based on the task counts." - Please explain what > reconciliation means, why it needs to happen, and why we need to track the > producer task and total task count in the watermark message to do this. > > Checkpoint watermarks section is also unclear. What problem are we trying > to solve here? > > Should also move the message format and the watermark message interface > sections to the bottom, since they depend on details in the event time and > checkpoint watermark sections. > > Thanks, > Prateek > > > On Wed, May 24, 2017 at 11:30 AM, xinyu liu <xinyuliu...@gmail.com> wrote: > > > Hi all, > > > > I created SEP-6 for SAMZA-1260 > > <https://issues.apache.org/jira/browse/SAMZA-1260>: Support Watermark > > Across Intermediate Streams for Batch Processing. The link to the SEP is > > here: > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP- > > 6+Support+Watermark+Across+Intermediate+Streams+for+Batch+Processing > > > > Please review and comments are welcome! > > > > Thanks, > > Xinyu > > >