Re: [PROPOSAL] new subproject: Avro

Bryan Duxbury Fri, 03 Apr 2009 11:05:53 -0700

With the schema in hand, you don't need to tag data with fieldnumbers or types, since that's all there in the schema. So, havingthe schema, you can use a simpler data format.

To a degree, we already have that in Thrift - we call it theDenseProtocol.

Would you write parsers for Thrift's IDL in every language? Orwould you use JSON, as Avro does, to avoid that?

When it comes to having a code-usable IDL for the schema, I'm totallypro-JSON.

Once you're using a different IDL and a different data format,what's shared with Thrift? Fundamentally, those two things definea serialization system, no?

It's not actually a different data format, is it? You're saying thatthe user wouldn't specify the field IDs, but you'd fundamentallystill use field ids for compactness and the like. You may not useactual Thrift generated objects, but you could certainly use Binaryor Compact protocol from Thrift to do all the writing to the wire.

You might also be able to use (or contribute to) Thrift's RPC-levelstuff like server implementations. We have some respectable Javaservers written, and if those aren't enough for your uses, I'dactually be really interested in seeing if we could generalize someof the Hadoop stuff to be useful within Thrift.

The bottom line is that I would love to see greater cooperationbetween Hadoop and Thrift. Unless it's impossible or impractical forThrift to be useful here, I think we'd be willing to work towardsHadoop's needs.


-Bryan

Re: [PROPOSAL] new subproject: Avro

Reply via email to