Hello all, I found myself that I’m a bit confused with Serialization requirements for Beam transforms and I want to precise something.
Here [1] it's clearly mentioned that “DoFn, PTransform, CombineFn and other instances will be serialized”. Since the most of Beam IO Read/Write transforms is based on PTransform, then it means that all internal members of them should be serializable too or declared as transient/static. In the same time, Javadoc of PTransform says [2] that “PTransform doesn't actually support serialization, despite implementing Serializable. PTransform is marked Serializable solely because it is common for an anonymous DoFn, instance to be created within an apply() method of a composite PTransform”. And, on the other hand, “DoFn passed to a ParDo transform must be Serializable” [3] So, DoFn must be really serializable, PTransform is not necessary. So, does it mean that the members (that are mostly AutoValue generated) of Read/Write PTransforms are free to be serializable or not if they don’t use anonymous DoFn's? For example, they are needed only for configuration on driver. However, if these members are used in DoFn or in other user defined objects further, when they will be involved on workers, then they must be serializable in any way. Is it correct assumption? [1] https://beam.apache.org/contribute/ptransform-style-guide/#serialization <https://beam.apache.org/contribute/ptransform-style-guide/#serialization> [2] https://github.com/apache/beam/blob/42dbb5d9c9fbf45676088a32f862101f03fa76fb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L116 [3] https://github.com/apache/beam/blob/e2bb239f0418f1c4949227ba3f51a5f4eb7235df/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L282
