On Fri, Feb 16, 2018 at 1:00 PM, Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > > The serialization of fn being once per bundle, the perf impact is only > huge if there is a bug somewhere else, even java serialization is > negligeable on big config compared to any small pipeline (seconds vs > minutes). >
Profiling is clear that this is a huge performance impact. One of the most important backwards-incompatible changes we made for Beam 2.0.0 was to allow Fn reuse across bundles. When we used a DoFn only for one bundle, there was no @Teardown because it has ~no use. You do everything in @FinishBundle. So for whatever use case you are working on, if your pipeline performs well enough doing it per bundle, you can put it in @FinishBundle. Of course it still might not get called because that is a logical impossibility - you just know that for a given element the element will be retried if @FinishBundle fails. If you have cleanup logic that absolutely must get executed, then you need to build a composite PTransform around it so it will be retried until cleanup succeeds. In Beam's sinks you can find many examples. Kenn