On 4/3/09 12:03 PM, "George Porter" <[email protected]> wrote:
> > > On Apr 3, 2009, at 11:37 AM, Doug Cutting wrote: >>> >> >> Field ids are not present in Avro data except in the schema. A >> record's fields are serialized in the order that the fields occur in >> the records schema, with no per-field annotations whatsoever. For >> example, a record that contains a string and an int is serialized >> simply as a string followed by an int, nothing before, nothing >> between and nothing after. So, yes, it is a different data format. > > While this representation would certainly be as compact as possible, > wouldn't it prevent evolving the data structure over time? One of the > nice features of Google Protocol Buffers and Thrift is that you can > evolve the set of fields over time, and older/newer clients can talk > to older/newer services. If the proposed Avro is evolvable, then > perhaps I'm misunderstanding your statement about the lack of IDs in > the serialized data. >From a quick perusal of the serialization format -- it contains headers with type/schema information, and other metadata blocks. The types can be inferred from this, and if this is done right then older/newer clients will be able to read things just fine. What can't be done is mixing two different formats in the same stream if headers define the format of the whole stream. I have not looked much deeper than that, but it looks like schema evolution is feasible. > > I also agree with Bryan, in that it would be unfortunate to have two > different Apache projects with overlapping goals. Regardless of > features, both protocol buffers and thrift have the advantage of being > debugged in mission-critical production environments. > > -George >
