On Wed, Jun 8, 2016 at 10:13 AM, Ben Chambers <bchamb...@google.com.invalid> wrote:
> - If failure occurs after finishBundle() but before the consumption is > committed, then the bundle may be reprocessed, which leads to duplicated > calls to processElement() and finishBundle(). > > - If failure occurs after consumption is committed but before > finishBundle(), then those elements which may have buffered state in the > DoFn but not had their side-effects fully processed (since the > finishBundle() was responsible for that) are lost. > I am trying to understand this better. Does this mean during recovery/replay after a failure, the particular instance of DoFn that existed before the worker failure would not be discarded, but might still receive elements? If a DoFn is caching some internal state, it should always assume the worker its on might abruptly fail anytime and the state would be lost, right?