On 05/14/2010 11:50 AM, Scott Carey wrote:
It doesn't cost on the serialization size but currently it costs a lot on the
performance side.
Not necessarily. For Pig data in Java I think one might reasonably write
a custom reader and writer that reads and writes Pig data structures
directly. This could look something like readSchema, writeSchema,
readJson and writeJson methods in the patch for AVRO-251.
https://issues.apache.org/jira/browse/AVRO-251
Start by looking at json.avsc (an Avro schema for arbitrary JSON data)
then see the readJson() method. It directly reads Avro data
corresponding to that schema into a Jackson JsonNode. ResolvingDecoder
enforces the schema.
Yes, it helps a lot. One question remains, how can I construct a recursive
schema programmatically?
I have a couple options for the pig Tuple avro schema -- write it in JSON and
put that in the source code or programmatically construct it.
I'm currently programmatically constructing a schema specific to the Pig schema
that is serialized, which is straightforward until I hit the map type and
recursion.
If you're not using a universal Pig schema then the above strategy may
or may not work. It might still work if the specific schema is always a
subset of the universal Pig schema, which I suspect it is.
To construct a recursive schema programmatically you need to do what the
schema parser does: create a record schema with createSchema(), create
it's fields, including one or more that references the record schema,
then call setFields() with the fields.
Doug