This has lain dormant as I was drawn off to other things. But now I'm looping back on this so there are no surprises in my upcoming (third) revision to PR #662 [1] to use protocol buffers instead of JSON schema or Avro (the two prior versions - now I know what the runner API looks like in every format :-)).
Here's the reasoning: 1. Since the Fn API requires the SDK harness to have protocol buffers support, there is no portability to be gained by having a proto-independent JSON schema or Avro schema for the Runner API. As currently designed, a language will need proto support in order to implement a Beam SDK. 2. Since proto has a JSON format that can be used for human readability, there's not really a usability benefit to using JSON schema and some other form of JSON. 3. Generation of helper libraries for proto is nice versus having a json schema, where support for generating POJOs, etc, might be incomplete or strange for some languages. 4. Some of the core generic "graph with stuff on the nodes and edges" definitions can be shared. If I've overlooked something, I'd love to hear about it. Kenn [1] https://github.com/apache/beam/pull/662 On Fri, Jul 15, 2016 at 8:24 AM, Lukasz Cwik <[email protected]> wrote: > Just to give people an update, I'm still working on collecting data. > > On Wed, Jun 29, 2016 at 10:47 AM, Aljoscha Krettek <[email protected]> > wrote: > > > My bad, I didn't know that. Thanks for the clarification! > > > > On Wed, 29 Jun 2016 at 16:38 Daniel Kulp <[email protected]> wrote: > > > > > > > > > On Jun 27, 2016, at 10:24 AM, Aljoscha Krettek <[email protected]> > > > wrote: > > > > > > > > Out of the systems you suggested Thrift and ProtoBuf3 + gRPC are > > probably > > > > best suited for the task. Both of these provide a way for generating > > > > serializers as well as for specifying an RPC interface. Avro and > > > > FlatBuffers are only dealing in serializers and we would have to roll > > our > > > > own RPC system on top of these. > > > > > > > > > Just a point of clarification, Avro does handle RPC as well as > > > serialization. It's one of the main bullets on their overview page: > > > > > > http://avro.apache.org/docs/current/index.html > > > > > > Unfortunately, their documentation around the subject really sucks. > Some > > > info at: > > > > > > > > https://cwiki.apache.org/confluence/display/AVRO/Porting+ > Existing+RPC+Frameworks > > > > > > and a “quick start”: > > > > > > https://github.com/phunt/avro-rpc-quickstart > > > > > > > > > > > > -- > > > Daniel Kulp > > > [email protected] - http://dankulp.com/blog > > > Talend Community Coder - http://coders.talend.com > > > > > > > > >
