Hey Pete, Can you write up some documentation on DynamicSerDe for the wiki? It's come up a few times in discussion and I think it would be of general use for people.
Thanks, Jeff On Mon, Oct 27, 2008 at 12:13 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote: > >> You'd still need to write IDL parsers & processors for each platform. > > Fyi - Hadoop already has this for Java - in hive/serde/DynamicSerDe. This is > exactly that and gives one the ability to read and write thrift and > non-thrift data without compilation. > > -- pete > > On 10/27/08 12:01 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > > Ted Dunning wrote: >> I don't think that it would be a major inconvenience in any of the major >> scripting languages to change the meaning of "open" to mean that you must >> read the IDL for a file, generate a reading script, load that and now be >> ready to read. This is a scripting language after all. > > That sounds like compilation, which isn't very scripty. It's certainly > workable, but not optimal. We want to push this stack all the way up to > spreadsheet-type programmers, who define new record types interactively. > Do we really want a GUI to run the Thrift compiler each time a file is > opened, and loading new code in? > >> Note that you are saying that the writer should have a schema. This seems >> to contradict your previous statement and agree with mine. > > We can induce a schema. If an application doesn't specify an output > schema then the first instance written might implicitly define the > schema. Or you could be more lax and modify the schema as instances are > written to match all instances, then append it at the end of the file. > So in the binary format there would always be a schema. It would be > used for compaction and available to readers to describe the data. > >>> So, how well does Thrift meet these needs? >> >> Very closely, actually, especially if you adjust it to allow the IDL to be >> inside the file. > > Thrift has a lot of the parts, and one could probably define a Thrift > protocol that does this. Looking through the Thrift mail archives, it > seems that TDenseProtocol with an IDL in the file would get you partway. > You'd still need to write IDL parsers & processors for each platform. > I'm not sure it would be any less work than to build this from > scratch, but I guess that's up to me to prove! > > On one hand, it's good to have an architecture that embraces more > different data formats. But, in practice, its nice to have actual data > in fewer formats, since otherwise you end up having to support the cross > product of formats and platforms. > >> We should also consider the JAQL work. > > Yes. I've started to look at that more. There examples imply a binary > format for JSON, but I can find no details. > > Doug > > >