Re: [PROPOSAL] new subproject: Avro

Doug Cutting Fri, 03 Apr 2009 13:03:11 -0700

George Porter wrote:

While this representation would certainly be as compact as possible,wouldn't it prevent evolving the data structure over time? One of thenice features of Google Protocol Buffers and Thrift is that you canevolve the set of fields over time, and older/newer clients can talk toolder/newer services. If the proposed Avro is evolvable, then perhapsI'm misunderstanding your statement about the lack of IDs in theserialized data.

Avro supports schema evolution. In Avro, the schema used to write thedata must be available when the data is read. (In files, it istypically stored in the file metadata.)

If you have the schema that was used to write the data, and you'reexpecting a slightly different schema, then you simply keep those fieldsthat are in both schemas and skip those not. This is equivalent toThrift and Protocol Buffer's support for schema evolution, but does notrequire manually assigning numeric field ids.

This feature can also be used to support projection. If you haverecords with many large fields, but only need a single field in aparticular computation, then you can specify an expected schema withonly that field, and the runtime will efficiently skip all of the otherfields, returning a record with just the single, expected field.

I also agree with Bryan, in that it would be unfortunate to have twodifferent Apache projects with overlapping goals.

We already have both Thrift and Etch in the incubator, which havesimilar goals. Apache does not attempt to mandate that projects havedisjoint goals. There are many ways to slice things, and Apache prefersto rely on survival of the fittest rather than forcing things together.

Regardless offeatures, both protocol buffers and thrift have the advantage of beingdebugged in mission-critical production environments.

Yes, but, as I've argued in other messages in this thread, they do notsupport the dynamic features we need. Adding those features would addnew code that would share little with existing code in those projects.So, while the projects are conceptually similar, the implementations arenecessarily different, and, without significant code overlap, separateprojects seem more natural.


Doug

Re: [PROPOSAL] new subproject: Avro

Reply via email to