The intended new semantics are as in Flink. The decode() method must know how much data it can read (e.g. by length-prefixing or other method). It should use the same encoding as the previous Context.INNER. In other words, not relying on the end-of-stream signal.
Having two encodings for a certain coder doesn't work well with the the idea that we can have cross-language encoding identified by URN, and anyhow there were very few INNER/OUTER special cases and actually a good portion of those were incorrectly implemented. This definitely should be clearly documented, at least in the javadoc. Kenn On Wed, Jul 19, 2017 at 4:56 AM, Aljoscha Krettek <[email protected]> wrote: > Hi, > > I want to quickly discuss coder semantics, specifically whether a Coder > should be required to know how much data it must/should read from the input > stream. Coders still have the deprecated encode()/decode() methods that > take a Context that can specify whether the input stream is only one > element or whether that stream contains multiple element. > > My main motivation is that in Flink a TypeSerializer must know how much > data it can read, it can never rely on the “remaining bytes” of the input > stream to determine whether it’s finished. Currently the situation is a bit > unclear, i.e. the Flink Runner only works with Coders that know when they > should finish reading. > > Best, > Aljoscha
