Hello all,

I found myself that I’m a bit confused with Serialization requirements for Beam 
transforms and I want to precise something.

Here [1] it's clearly mentioned that “DoFn, PTransform, CombineFn and other 
instances will be serialized”. Since the most of Beam IO Read/Write transforms 
is based on PTransform, then it means that all internal members of them should 
be serializable too or declared as transient/static. 

In the same time, Javadoc of PTransform says [2] that “PTransform doesn't 
actually support serialization, despite implementing Serializable. PTransform 
is marked Serializable solely because it is common for an anonymous DoFn, 
instance to be created within an apply() method of a composite PTransform”. 
And, on the other hand, “DoFn passed to a ParDo transform must be Serializable” 
[3] So, DoFn must be really serializable, PTransform is not necessary.

So, does it mean that the members (that are mostly AutoValue generated) of 
Read/Write PTransforms are free to be serializable or not if they don’t use 
anonymous DoFn's? For example, they are needed only for configuration on 
driver. However, if these members are used in DoFn or in other user defined 
objects further, when they will be involved on workers, then they must be 
serializable in any way.  Is it correct assumption?

[1] https://beam.apache.org/contribute/ptransform-style-guide/#serialization 
<https://beam.apache.org/contribute/ptransform-style-guide/#serialization>
[2] 
https://github.com/apache/beam/blob/42dbb5d9c9fbf45676088a32f862101f03fa76fb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L116
[3] 
https://github.com/apache/beam/blob/e2bb239f0418f1c4949227ba3f51a5f4eb7235df/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L282

Reply via email to