Re: [PROPOSAL] new subproject: Avro

Doug Cutting Tue, 07 Apr 2009 09:17:13 -0700

Chad Walters wrote:

I do think, however, that it will be very
difficult for them to work together properly if the goal of code
reuse by Thrift is not an explicit goal of Avro.

Code reuse is an explicit goal of Avro. It's an open source projectwith public APIs intended to expose all of its functionality.

I think that by working closely with the Thrift community directly in
the Thrift code base, you will get several significant benefits.

It's not like I did not consider this approach, evolving Thrift tobetter support my needs. In fact, I considered it for months beforeabandoning it. I am very familiar with these arguments.

I am starting a new serialization project fully aware of the hazards. Ifeel that, on balance, it is considerably simpler for Avro to bedeveloped separately and that this will not adversely affect its usersor its developer community. You may disagree. As volunteers here, weare both free to do as we choose.

To support the second use case, dynamic schema interpretation, there
is definitely significant new code to be written. Note that this code
is essentially the same code wherever you are writing it.

This is a primary case for Avro. Without it, Avro's a non-starter.And, as you note, this is new code that must be written for eachplatform. That's primarily what Avro is. Fitting this code into Thriftwould only make it more complicated.

Whatever
work you are doing in Avro to be able to dynamically interpret JSON
IDL could just be directly implemented in Thrift -- we would just
define a JSON version of the Thrift IDL which would look a lot like
Avro's IDL. To help further with interoperability we could make the
Thrift compiler generate the JSON IDL from the Thrift IDL as another
output target.

Sure, we could bolt Avro's features onto the side of Thrift, but thatdoesn't make it easier for me to deliver Avro's features nor any easierfor folks to use them. And Thrift doesn't need a second IDL format. Italready suffers from too many formats. I seek a single format, not amultitude.

The basic upshot of the above is that it is not that hard to see how
Avro could be directly integrated into Thrift if you were willing to
entertain that option and I believe that you would see significant
benefits that would more than offset the impact to your own ease of
development about which you expressed concerns.


I am unlikely to implement it myself, as it does not address my needs.

I am proposing that the IDL would
only allow for field IDs to be omitted in the case where the schema
was being interpreted dynamically -- no static bindings could be
generated from IDL without fully specified field IDs. So if you are
only interested in dynamic interpretation, you never have to look at
or even think about field IDs. Does that in any way alter your stance
here?

Not really. It adds an "except on Tuesday" clause in the specification,which is not ideal. In Avro we can generate static bindings withoutusing field ids.

It could be a floor wax and a dessert topping!


Love the SNL reference, but I don't think it is really appropos. My
vision for Thrft with Avro's features folded in as a unified
framework for cross-language serialization, covering a variety of use
cases, is not jamming two completely heterogeneous things together. I
can easily see wanting to take structures represented in one
serialization format from disk and send them out over RPC. Thrift
provides the means to do this kind of thing seemlessly, with formats
appropriate to both use cases, rather than selecting a format that is
good for one use case and so-so for the other.

I believe that the cost of supporting multiple formats is too high. Wediffer on that point. I don't think one-stop-shopping is appropriatehere, but prefer to provide an ala-carte format.


Doug

Re: [PROPOSAL] new subproject: Avro

Reply via email to