Thank you for clarification, Luke.

> On 6 Jan 2020, at 20:03, Luke Cwik <[email protected]> wrote:
> 
> Anything that is reachable by the DoFn/CombineFn/*Fn needs to be 
> serializable. [1] is saying that it is common to have an anonymous inner 
> class for a DoFn which because of its serialization capture will get the 
> encompassing class which is typically a PTransform. If you are careful about 
> reachability, you can decide to not mark lots of things as serializable and 
> this is good because it decreases the size of the serialized *Fn blob as well.
> 
> The [2] javadoc could be clarified that PTransform class supports 
> serialization but is only serialized when part of the serialization capture 
> of a DoFn/CombineFn/*Fn and otherwise will never be serialized.

Yes, I think it would be more clear in this sense. 


> 
> On Mon, Jan 6, 2020 at 10:19 AM Alexey Romanenko <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello all,
> 
> I found myself that I’m a bit confused with Serialization requirements for 
> Beam transforms and I want to precise something.
> 
> Here [1] it's clearly mentioned that “DoFn, PTransform, CombineFn and other 
> instances will be serialized”. Since the most of Beam IO Read/Write 
> transforms is based on PTransform, then it means that all internal members of 
> them should be serializable too or declared as transient/static. 
> 
> In the same time, Javadoc of PTransform says [2] that “PTransform doesn't 
> actually support serialization, despite implementing Serializable. PTransform 
> is marked Serializable solely because it is common for an anonymous DoFn, 
> instance to be created within an apply() method of a composite PTransform”. 
> And, on the other hand, “DoFn passed to a ParDo transform must be 
> Serializable” [3] So, DoFn must be really serializable, PTransform is not 
> necessary.
> 
> So, does it mean that the members (that are mostly AutoValue generated) of 
> Read/Write PTransforms are free to be serializable or not if they don’t use 
> anonymous DoFn's? For example, they are needed only for configuration on 
> driver. However, if these members are used in DoFn or in other user defined 
> objects further, when they will be involved on workers, then they must be 
> serializable in any way.  Is it correct assumption?
> 
> Yes
>  
> 
> [1] https://beam.apache.org/contribute/ptransform-style-guide/#serialization 
> <https://beam.apache.org/contribute/ptransform-style-guide/#serialization>
> [2] 
> https://github.com/apache/beam/blob/42dbb5d9c9fbf45676088a32f862101f03fa76fb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L116
>  
> <https://github.com/apache/beam/blob/42dbb5d9c9fbf45676088a32f862101f03fa76fb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L116>
> [3] 
> https://github.com/apache/beam/blob/e2bb239f0418f1c4949227ba3f51a5f4eb7235df/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L282
>  
> <https://github.com/apache/beam/blob/e2bb239f0418f1c4949227ba3f51a5f4eb7235df/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L282>

Reply via email to