Thanks a lot for sharing the link. I take a quick look at the design and the implementation in Java and think it could address my concern. It seems that it's still not supported in the Python SDK Harness. Is there any plan on that?
Robert Bradshaw <rober...@google.com> 于2019年7月30日周二 下午12:33写道: > On Tue, Jul 30, 2019 at 11:52 AM jincheng sun <sunjincheng...@gmail.com> > wrote: > >> >>>> Is it possible to add an interface such as `isSelfContained()` to the >>>> `Coder`? This interface indicates >>>> whether the serialized bytes are self contained. If it returns true, >>>> then there is no need to add a prefixing length. >>>> In this way, there is no need to introduce an extra protocol, Please >>>> correct me if I missed something :) >>>> >>> >>> The question is how it is self contained. E.g. DoubleCoder is self >>> contained because it always uses exactly 8 bytes, but one needs to know the >>> double coder to leverage this. VarInt coder is self-contained a different >>> way, as is StringCoder (which does just do prefixing). >>> >> >> Yes, you are right! I think it again that we can not add such interface >> for the coder, due to runner can not call it. And just one more thought: >> does it make sense to add a method such as "registerSelfContained >> Coder(xxx)" or so to let users register the coders which can be processed >> in the SDK Harness? It's the responsibility of the SDK harness to ensure >> that the coder is supported. >> > > Basically, a "please don't add length prefixing to this coder, assume > everyone else can understand it (and errors will ensue if anyone doesn't)" > at the user level? Seems a bit dangerous. Also, there is not "the > SDK"--there may be multiple other SDKs in general, and of course runner > components, some of which may understand the coder in question and some of > which may not. > > I would say that if this becomes a problem, we could look at the pros and > cons of various remedies, this being one alternative. > > >> >> >>> I am hopeful that schemas give us a rich enough way to encode the vast >>> majority of types that we will want to transmit across language barriers >>> (possibly with some widening promotions). For high performance one will >>> want to use formats like arrow rather than one-off coders as well, which >>> also biases us towards the schema work. The set of StandardCoders is not >>> closed, and nor is the possibility of figuring out a way to communicate >>> outside this set for a particular pair of languages, but I think it makes >>> sense to avoid going that direction unless we have to due to the increased >>> API surface aread and complexity it imposes on all runners and SDKs. >>> >> >> Great! Could you share some links about the schema work. It seems very >> interesting and promising. >> > > https://beam.apache.org/contribute/design-documents/#sql--schema and of > particular relevance https://s.apache.org/beam-schemas > > >