George Porter wrote:
While this representation would certainly be as compact as possible, wouldn't it prevent evolving the data structure over time? One of the nice features of Google Protocol Buffers and Thrift is that you can evolve the set of fields over time, and older/newer clients can talk to older/newer services. If the proposed Avro is evolvable, then perhaps I'm misunderstanding your statement about the lack of IDs in the serialized data.
Avro supports schema evolution. In Avro, the schema used to write the data must be available when the data is read. (In files, it is typically stored in the file metadata.)
If you have the schema that was used to write the data, and you're expecting a slightly different schema, then you simply keep those fields that are in both schemas and skip those not. This is equivalent to Thrift and Protocol Buffer's support for schema evolution, but does not require manually assigning numeric field ids.
This feature can also be used to support projection. If you have records with many large fields, but only need a single field in a particular computation, then you can specify an expected schema with only that field, and the runtime will efficiently skip all of the other fields, returning a record with just the single, expected field.
I also agree with Bryan, in that it would be unfortunate to have two different Apache projects with overlapping goals.
We already have both Thrift and Etch in the incubator, which have similar goals. Apache does not attempt to mandate that projects have disjoint goals. There are many ways to slice things, and Apache prefers to rely on survival of the fittest rather than forcing things together.
Regardless of features, both protocol buffers and thrift have the advantage of being debugged in mission-critical production environments.
Yes, but, as I've argued in other messages in this thread, they do not support the dynamic features we need. Adding those features would add new code that would share little with existing code in those projects. So, while the projects are conceptually similar, the implementations are necessarily different, and, without significant code overlap, separate projects seem more natural.
Doug
