On Fri, Feb 16, 2018 at 1:00 PM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:
>
> The serialization of fn being once per bundle, the perf impact is only
> huge if there is a bug somewhere else, even java serialization is
> negligeable on big config compared to any small pipeline (seconds vs
> minutes).
>

Profiling is clear that this is a huge performance impact. One of the most
important backwards-incompatible changes we made for Beam 2.0.0 was to
allow Fn reuse across bundles.

When we used a DoFn only for one bundle, there was no @Teardown because it
has ~no use. You do everything in @FinishBundle. So for whatever use case
you are working on, if your pipeline performs well enough doing it per
bundle, you can put it in @FinishBundle. Of course it still might not get
called because that is a logical impossibility - you just know that for a
given element the element will be retried if @FinishBundle fails.

If you have cleanup logic that absolutely must get executed, then you need
to build a composite PTransform around it so it will be retried until
cleanup succeeds. In Beam's sinks you can find many examples.

Kenn

Reply via email to