I think as Luke pointed out, one problem with Serializable is that it isn't guaranteed to be stable across different JVM versions.
The only drawback I see here from the user perspective is that if I have already written a custom coder to handle my type in some reasonable way, this is going to ignore that and try to use a serializer. If my coder had any special logic (not serializing some fields, using a more efficient representation either for the serialized object or the de-serialized form) that won't be propagated etc. I think this could lead to confusion. Imagine someone writes a Coder that attempts to reuse decoded values (not unreasonable for things like hourly windows, where you may have many things referencing the same hour). A user may be surprised to find that there are many instances of each hour because we have chosen to use a serializer rather than the Coder they instructed us to use. On Thu, Aug 24, 2017 at 9:57 AM Lukasz Cwik <lc...@google.com.invalid> wrote: > How does Kryo's FieldSerializer fail on WindowedValue/PaneInfo, it seems > like those are pretty simple types. > > Also, I don't see why they can't be tagged with Serializable but I think > the original reasoning was that coders should be used to guarantee a stable > representation across JVM versions. > > On Thu, Aug 24, 2017 at 5:38 AM, Kobi Salant <kobi.sal...@gmail.com> > wrote: > > > Hi All, > > > > I am working on Spark runner issue > > https://issues.apache.org/jira/browse/BEAM-2669 > > "Kryo serialization exception when DStreams containing > > non-Kryo-serializable data are cached" > > > > Currently, Spark runner enforce the kryo serializer and in shuffling > > scenarios we always uses coders to transfer bytes over the network. But > > Spark can also serialize data for caching/persisting and currently it > uses > > kryo for this. > > > > Today, when the user uses a class which is not kryo serializable the > > caching fails and we thought to open the option to use java serialization > > as a fallback. > > > > Our RDDs/DStreams behind the PCollections are usually typed > > RDD/DStream<WindowedValue<InputT>> > > and when Spark tries to java serialize them for caching purposes it fails > > on WindowedValue not being java serializable. > > > > Is there any objection to add Serializable implements to SDK classes like > > WindowedValue, PaneInfo and others? > > > > Again, i want to emphasise that coders are a big part of the Spark runner > > code and we do use them whenever a shuffle is expected. Changing the > types > > of the RDDs/DStreams to byte[] will make the code pretty unreadable and > > will weaken type safety checks. > > > > Thanks > > Kobi > > >