Hi Shen, The only guarantee made by splitAtFraction is that the primary source + residual source = original source - there are no other guarantees. For checkpointing, you can use the following pattern: When you want to checkpoint, call splitAtFraction(getFractionConsumed() + epsilon) or something like that [the API is not great - ideally it would have been splitAfterFractionOfRemainder(0.0) if there was such a method]. That means "stop as soon as possible"; after this call succeeds, its return value is a checkpoint to resume from; however you still need to let the current reader complete - it may have a few more records or even blocks left [with SDF, there's a similar but more explicit method checkpoint() that guarantees that there'll be no more blocks].
On Wed, May 16, 2018 at 11:14 AM Shen Li <cs.she...@gmail.com> wrote: > Hi, > > After recovering from a checkpoint, is it correct to use > BoundedSource.BoundedReader#splitAtFraction(double) to resume a > BoundedSource? My concern is that the doc says "the new range would contain > *approximately* the given fraction of the amount of data in the current > range." Does the word *approximately* here mean that the application could > potentially miss some data from the BoundedSource if resume > from reader.splitAtFraction(fraction)? > > Thanks, > Shen >