Re: Checkpointing and Restoring BoundedSource

Eugene Kirpichov Wed, 16 May 2018 11:28:40 -0700

Hi Shen,
The only guarantee made by splitAtFraction is that the primary source +
residual source = original source - there are no other guarantees.
For checkpointing, you can use the following pattern:
When you want to checkpoint, call splitAtFraction(getFractionConsumed() +
epsilon) or something like that [the API is not great - ideally it would
have been splitAfterFractionOfRemainder(0.0) if there was such a method].
That means "stop as soon as possible"; after this call succeeds, its return
value is a checkpoint to resume from; however you still need to let the
current reader complete - it may have a few more records or even blocks
left [with SDF, there's a similar but more explicit method checkpoint()
that guarantees that there'll be no more blocks].


On Wed, May 16, 2018 at 11:14 AM Shen Li <cs.she...@gmail.com> wrote:

> Hi,
>
> After recovering from a checkpoint, is it correct to use
> BoundedSource.BoundedReader#splitAtFraction(double) to resume a
> BoundedSource? My concern is that the doc says "the new range would contain
> *approximately* the given fraction of the amount of data in the current
> range." Does the word *approximately* here mean that the application could
> potentially miss some data from the BoundedSource if resume
> from reader.splitAtFraction(fraction)?
>
> Thanks,
> Shen
>

Re: Checkpointing and Restoring BoundedSource

Reply via email to