Re: [DISCUSS] Coder semantics

Kenneth Knowles Wed, 19 Jul 2017 07:09:54 -0700

The intended new semantics are as in Flink. The decode() method must know
how much data it can read (e.g. by length-prefixing or other method). It
should use the same encoding as the previous Context.INNER. In other words,
not relying on the end-of-stream signal.

Having two encodings for a certain coder doesn't work well with the the
idea that we can have cross-language encoding identified by URN, and anyhow
there were very few INNER/OUTER special cases and actually a good portion
of those were incorrectly implemented.

This definitely should be clearly documented, at least in the javadoc.

Kenn

On Wed, Jul 19, 2017 at 4:56 AM, Aljoscha Krettek <[email protected]>
wrote:

> Hi,
>
> I want to quickly discuss coder semantics, specifically whether a Coder
> should be required to know how much data it must/should read from the input
> stream. Coders still have the deprecated encode()/decode() methods that
> take a Context that can specify whether the input stream is only one
> element or whether that stream contains multiple element.
>
> My main motivation is that in Flink a TypeSerializer must know how much
> data it can read, it can never rely on the “remaining bytes” of the input
> stream to determine whether it’s finished. Currently the situation is a bit
> unclear, i.e. the Flink Runner only works with Coders that know when they
> should finish reading.
>
> Best,
> Aljoscha

Re: [DISCUSS] Coder semantics

Reply via email to