1. Create an empty checkpoint file for the stateless operators. 2. Remove the logic to treat stateless operators as a special case.
Rest of the design remains as is. On Wed, Mar 1, 2017 at 11:18 AM Amol Kekre <[email protected]> wrote: > The third option should be it. > 1. On relaunch the DAG should start at commitWindowId > 2. Pruning of checkpoints should only happen after committedWindowId is > written by Stram state > > Thks > Amol > > > > E:[email protected] | M: 510-449-2606 <(510)%20449-2606> | Twitter: > @*amolhkekre* > > www.datatorrent.com | apex.apache.org > > *Join us at Apex Big Data World-San Jose > <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!* > [image: http://www.apexbigdata.com/san-jose-register.html] > <http://www.apexbigdata.com/san-jose-register.html> > > On Wed, Mar 1, 2017 at 5:34 AM, Tushar Gosavi <[email protected]> wrote: > > > Help Needed for APEXCORE-619 > > > > Issue : When application is relaunched after long time with stateless > > opeartors at the end of the DAG, the stateless operators starts with a > very > > high windowId. In this case the stateless operator ignors all the data > > received till upstream operator catches up with it. This breaks the > > *at-least-once* gaurantee while relaunch of the opeartor or when master > is > > killed and application is restarted. > > > > Solutions: > > - Fix windowId for stateless leaf operators from upstream opeartor. But > it > > has some issues when we have a join with two upstrams operators at > > different windowId. If we set the windowID to min(upstream windowId), > then > > we need to again recalulate the new recovery window ids for upstream > paths > > from this operators. > > > > - Other solution is to create a empty file in checkpoint directory for > > stateless operators. This will help us to identify the checkpoints of > > stateless operators during relaunch instead of computing from latest > > timestamp. > > > > - Bring the entire DAG to committedWindowId. This could be achived using > > writing committedWindowId in a journal. we need to make sure that we are > > not puring the checkpointed state until the committedWundowId is saved in > > journal. > > > > Let me know your thoughs on this and preferred solution. > > > > Regards, > > -Tushar. > > > -- *Join us at Apex Big Data World-San Jose <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!* [image: http://www.apexbigdata.com/san-jose-register.html]
