Re: [PROPOSAL] new subproject: Avro

Doug Cutting Fri, 03 Apr 2009 11:52:30 -0700

Bryan Duxbury wrote:

It's not actually a different data format, is it? You're saying that theuser wouldn't specify the field IDs, but you'd fundamentally still usefield ids for compactness and the like.

Field ids are not present in Avro data except in the schema. A record'sfields are serialized in the order that the fields occur in the recordsschema, with no per-field annotations whatsoever. For example, a recordthat contains a string and an int is serialized simply as a stringfollowed by an int, nothing before, nothing between and nothing after.So, yes, it is a different data format.

The bottom line is that I would love to see greater cooperation betweenHadoop and Thrift. Unless it's impossible or impractical for Thrift tobe useful here, I think we'd be willing to work towards Hadoop's needs.

Perhaps Thrift could be augmented to support Avro's JSON schemas andserialization. Then it could interoperate with other Avro-basedsystems. But then Thrift would have yet another serialization format,that every language would need to implement for it to be useful...

Avro will only ever have one serialization format. Thrift fundamentallystandardizes an API, not a data format. Avro fundamentally is a dataformat specification, like XML. Thrift could implement thisspecification. The Avro project includes reference implementations, butthe format is intended to be simple enough and the specification stableenough that others might reasonably develop alternate, independentimplementations.


Doug

Re: [PROPOSAL] new subproject: Avro

Reply via email to