Can't the commitedWindowId be calculated by looking at the physical plan
and the existing checkpoints?

On Wed, Mar 1, 2017 at 5:34 AM, Tushar Gosavi <tus...@apache.org> wrote:

> Help Needed for APEXCORE-619
>
> Issue : When application is relaunched after long time with stateless
> opeartors at the end of the DAG, the stateless operators starts with a very
> high windowId. In this case the stateless operator ignors all the data
> received till upstream operator catches up with it. This breaks the
> *at-least-once* gaurantee while relaunch of the opeartor or when master is
> killed and application is restarted.
>
> Solutions:
> - Fix windowId for stateless leaf operators from upstream opeartor. But it
> has some issues when we have a join with two upstrams operators at
> different windowId. If we set the windowID to min(upstream windowId), then
> we need to again recalulate the new recovery window ids for upstream paths
> from this operators.
>
> - Other solution is to create a empty file in checkpoint directory for
> stateless operators. This will help us to identify the checkpoints of
> stateless operators during relaunch instead of computing from latest
> timestamp.
>
> - Bring the entire DAG to committedWindowId. This could be achived using
> writing committedWindowId in a journal. we need to make sure that we are
> not puring the checkpointed state until the committedWundowId is saved in
> journal.
>
> Let me know your thoughs on this and preferred solution.
>
> Regards,
> -Tushar.
>

Reply via email to