I think as Luke pointed out, one problem with Serializable is that it isn't
guaranteed to be stable across different JVM versions.

The only drawback I see here from the user perspective is that if I have
already written a custom coder to handle my type in some reasonable way,
this is going to ignore that and try to use a serializer. If my coder had
any special logic (not serializing some fields, using a more efficient
representation either for the serialized object or the de-serialized form)
that won't be propagated etc.

I think this could lead to confusion. Imagine someone writes a Coder that
attempts to reuse decoded values (not unreasonable for things like hourly
windows, where you may have many things referencing the same hour). A user
may be surprised to find that there are many instances of each hour because
we have chosen to use a serializer rather than the Coder they instructed us
to use.

On Thu, Aug 24, 2017 at 9:57 AM Lukasz Cwik <lc...@google.com.invalid>
wrote:

> How does Kryo's FieldSerializer fail on WindowedValue/PaneInfo, it seems
> like those are pretty simple types.
>
> Also, I don't see why they can't be tagged with Serializable but I think
> the original reasoning was that coders should be used to guarantee a stable
> representation across JVM versions.
>
> On Thu, Aug 24, 2017 at 5:38 AM, Kobi Salant <kobi.sal...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I am working on Spark runner issue
> > https://issues.apache.org/jira/browse/BEAM-2669
> > "Kryo serialization exception when DStreams containing
> > non-Kryo-serializable data are cached"
> >
> > Currently, Spark runner enforce the kryo serializer and in shuffling
> > scenarios we always uses coders to transfer bytes over the network. But
> > Spark can also serialize data for caching/persisting and currently it
> uses
> > kryo for this.
> >
> > Today, when the user uses a class which is not kryo serializable the
> > caching fails and we thought to open the option to use java serialization
> > as a fallback.
> >
> > Our RDDs/DStreams behind the PCollections are usually typed
> > RDD/DStream<WindowedValue<InputT>>
> > and when Spark tries to java serialize them for caching purposes it fails
> > on WindowedValue not being java serializable.
> >
> > Is there any objection to add Serializable implements to SDK classes like
> > WindowedValue, PaneInfo and others?
> >
> > Again, i want to emphasise that coders are a big part of the Spark runner
> > code and we do use them whenever a shuffle is expected. Changing the
> types
> > of the  RDDs/DStreams to byte[] will make the code pretty unreadable and
> > will weaken type safety checks.
> >
> > Thanks
> > Kobi
> >
>

Reply via email to