Ted Dunning wrote:
I don't think that it would be a major inconvenience in any of the major
scripting languages to change the meaning of "open" to mean that you must
read the IDL for a file, generate a reading script, load that and now be
ready to read.  This is a scripting language after all.

That sounds like compilation, which isn't very scripty. It's certainly workable, but not optimal. We want to push this stack all the way up to spreadsheet-type programmers, who define new record types interactively. Do we really want a GUI to run the Thrift compiler each time a file is opened, and loading new code in?

Note that you are saying that the writer should have a schema.  This seems
to contradict your previous statement and agree with mine.

We can induce a schema. If an application doesn't specify an output schema then the first instance written might implicitly define the schema. Or you could be more lax and modify the schema as instances are written to match all instances, then append it at the end of the file. So in the binary format there would always be a schema. It would be used for compaction and available to readers to describe the data.

So, how well does Thrift meet these needs?

Very closely, actually, especially if you adjust it to allow the IDL to be
inside the file.

Thrift has a lot of the parts, and one could probably define a Thrift protocol that does this. Looking through the Thrift mail archives, it seems that TDenseProtocol with an IDL in the file would get you partway. You'd still need to write IDL parsers & processors for each platform. I'm not sure it would be any less work than to build this from scratch, but I guess that's up to me to prove!

On one hand, it's good to have an architecture that embraces more different data formats. But, in practice, its nice to have actual data in fewer formats, since otherwise you end up having to support the cross product of formats and platforms.

We should also consider the JAQL work.

Yes. I've started to look at that more. There examples imply a binary format for JSON, but I can find no details.

Doug

Reply via email to