Bryan Duxbury wrote:
It's not actually a different data format, is it? You're saying that the user wouldn't specify the field IDs, but you'd fundamentally still use field ids for compactness and the like.

Field ids are not present in Avro data except in the schema. A record's fields are serialized in the order that the fields occur in the records schema, with no per-field annotations whatsoever. For example, a record that contains a string and an int is serialized simply as a string followed by an int, nothing before, nothing between and nothing after. So, yes, it is a different data format.

The bottom line is that I would love to see greater cooperation between Hadoop and Thrift. Unless it's impossible or impractical for Thrift to be useful here, I think we'd be willing to work towards Hadoop's needs.

Perhaps Thrift could be augmented to support Avro's JSON schemas and serialization. Then it could interoperate with other Avro-based systems. But then Thrift would have yet another serialization format, that every language would need to implement for it to be useful...

Avro will only ever have one serialization format. Thrift fundamentally standardizes an API, not a data format. Avro fundamentally is a data format specification, like XML. Thrift could implement this specification. The Avro project includes reference implementations, but the format is intended to be simple enough and the specification stable enough that others might reasonably develop alternate, independent implementations.

Doug

Reply via email to