Anything that is reachable by the DoFn/CombineFn/*Fn needs to be serializable. [1] is saying that it is common to have an anonymous inner class for a DoFn which because of its serialization capture will get the encompassing class which is typically a PTransform. If you are careful about reachability, you can decide to not mark lots of things as serializable and this is good because it decreases the size of the serialized *Fn blob as well.
The [2] javadoc could be clarified that PTransform class supports serialization but is only serialized when part of the serialization capture of a DoFn/CombineFn/*Fn and otherwise will never be serialized. On Mon, Jan 6, 2020 at 10:19 AM Alexey Romanenko <[email protected]> wrote: > Hello all, > > I found myself that I’m a bit confused with Serialization requirements for > Beam transforms and I want to precise something. > > Here [1] it's clearly mentioned that “*DoFn, PTransform, CombineFn and > other instances will be serialized*”. Since the most of Beam IO > Read/Write transforms is based on PTransform, then it means that all > internal members of them should be serializable too or declared as > transient/static. > > In the same time, Javadoc of PTransform says [2] that “*PTransform doesn't > actually support serialization, despite implementing > Serializable. PTransform is marked Serializable solely because it is common > for an anonymous **DoFn, instance to be created within an apply() method > of a composite **PTransform*”. And, on the other hand, “*DoFn passed to a > ParDo transform must be Serializable*” [3] So, DoFn must be really > serializable, PTransform is not necessary. > > So, does it mean that the members (that are mostly AutoValue generated) of > Read/Write PTransforms are free to be serializable or not if they don’t use > anonymous DoFn's? For example, they are needed only for configuration on > driver. However, if these members are used in DoFn or in other user defined > objects further, when they will be involved on workers, then they must be > serializable in any way. Is it correct assumption? > Yes > > [1] > https://beam.apache.org/contribute/ptransform-style-guide/#serialization > [2] > https://github.com/apache/beam/blob/42dbb5d9c9fbf45676088a32f862101f03fa76fb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L116 > [3] > https://github.com/apache/beam/blob/e2bb239f0418f1c4949227ba3f51a5f4eb7235df/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L282 > >
