+1 to what Kenn said. Moving to a single encoding for all data types will alleviate many problems with cross language communication. For backwards compatibility, we still have the outer encoding/decoding on coders, no one is required to invoke them but may still choose to do so until they are removed in the future.
On Wed, Jul 19, 2017 at 7:08 AM, Kenneth Knowles <[email protected]> wrote: > The intended new semantics are as in Flink. The decode() method must know > how much data it can read (e.g. by length-prefixing or other method). It > should use the same encoding as the previous Context.INNER. In other words, > not relying on the end-of-stream signal. > > Having two encodings for a certain coder doesn't work well with the the > idea that we can have cross-language encoding identified by URN, and anyhow > there were very few INNER/OUTER special cases and actually a good portion > of those were incorrectly implemented. > > This definitely should be clearly documented, at least in the javadoc. > > Kenn > > On Wed, Jul 19, 2017 at 4:56 AM, Aljoscha Krettek <[email protected]> > wrote: > > > Hi, > > > > I want to quickly discuss coder semantics, specifically whether a Coder > > should be required to know how much data it must/should read from the > input > > stream. Coders still have the deprecated encode()/decode() methods that > > take a Context that can specify whether the input stream is only one > > element or whether that stream contains multiple element. > > > > My main motivation is that in Flink a TypeSerializer must know how much > > data it can read, it can never rely on the “remaining bytes” of the input > > stream to determine whether it’s finished. Currently the situation is a > bit > > unclear, i.e. the Flink Runner only works with Coders that know when they > > should finish reading. > > > > Best, > > Aljoscha >
