Re: [DISCUSS] Coder semantics

Lukasz Cwik Wed, 19 Jul 2017 11:50:25 -0700

+1 to what Kenn said.

Moving to a single encoding for all data types will alleviate many problems
with cross language communication.
For backwards compatibility, we still have the outer encoding/decoding on
coders, no one is required to invoke them but may still choose to do so
until they are removed in the future.


On Wed, Jul 19, 2017 at 7:08 AM, Kenneth Knowles <[email protected]>
wrote:

> The intended new semantics are as in Flink. The decode() method must know
> how much data it can read (e.g. by length-prefixing or other method). It
> should use the same encoding as the previous Context.INNER. In other words,
> not relying on the end-of-stream signal.
>
> Having two encodings for a certain coder doesn't work well with the the
> idea that we can have cross-language encoding identified by URN, and anyhow
> there were very few INNER/OUTER special cases and actually a good portion
> of those were incorrectly implemented.
>
> This definitely should be clearly documented, at least in the javadoc.
>
> Kenn
>
> On Wed, Jul 19, 2017 at 4:56 AM, Aljoscha Krettek <[email protected]>
> wrote:
>
> > Hi,
> >
> > I want to quickly discuss coder semantics, specifically whether a Coder
> > should be required to know how much data it must/should read from the
> input
> > stream. Coders still have the deprecated encode()/decode() methods that
> > take a Context that can specify whether the input stream is only one
> > element or whether that stream contains multiple element.
> >
> > My main motivation is that in Flink a TypeSerializer must know how much
> > data it can read, it can never rely on the “remaining bytes” of the input
> > stream to determine whether it’s finished. Currently the situation is a
> bit
> > unclear, i.e. the Flink Runner only works with Coders that know when they
> > should finish reading.
> >
> > Best,
> > Aljoscha
>

Re: [DISCUSS] Coder semantics

Reply via email to