Thanks a lot for the comments. I updated the SEP with more details and
clarification. Please let me know if you have further questions.

Thanks,
Xinyu

On Thu, May 25, 2017 at 11:19 AM, Prateek Maheshwari <
pmaheshw...@linkedin.com.invalid> wrote:

> Hi Xinyu,
>
> Thanks for the proposal. Some requests for clarifications. Let's update the
> SEP directly instead of replying here.
>
> E.g., in "For any following intermediate stream whose input streams are all
> end-of-stream, it will be marked as pending EOS" - Should clarify that
> (IIUC) something is injecting EOS messages in all intermediate stream
> partitions once it receives EOS from all input stream partitions it's
> consuming. Should also clarify what is that something.
> Same for "declare end of stream once all the EOS messages have been
> received." - What does this declaration involve and who is doing this?
>
> In pro for approach 2: Not clear what this means - "The watermark can
> conclude the input messages before this watermark have been complete."
>
> For the cons of approach 2: "Complicated failure scenario of the second
> job. It needs to checkpoint all the watermark messages received, so when it
> recovered from failure, it can still count." - How is this related to EOS?
> How is this related to the checkpoint watermark section?
> Also, what is the "more messages required to write.. " referring to?
>
> "Samza needs to reconcile based on the task counts." - Please explain what
> reconciliation means, why it needs to happen, and why we need to track the
> producer task and total task count in the watermark message to do this.
>
> Checkpoint watermarks section is also unclear. What problem are we trying
> to solve here?
>
> Should also move the message format and the watermark message interface
> sections to the bottom, since they depend on details in the event time and
> checkpoint watermark sections.
>
> Thanks,
> Prateek
>
>
> On Wed, May 24, 2017 at 11:30 AM, xinyu liu <xinyuliu...@gmail.com> wrote:
>
> > Hi all,
> >
> > I created SEP-6 for SAMZA-1260
> > <https://issues.apache.org/jira/browse/SAMZA-1260>: Support Watermark
> > Across Intermediate Streams for Batch Processing. The link to the SEP is
> > here:
> >
> > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 6+Support+Watermark+Across+Intermediate+Streams+for+Batch+Processing
> >
> > Please review and comments are welcome!
> >
> > Thanks,
> > Xinyu
> >
>

Reply via email to