On 05/14/2010 11:50 AM, Scott Carey wrote:
It doesn't cost on the serialization size but currently it costs a lot on the 
performance side.

Not necessarily. For Pig data in Java I think one might reasonably write a custom reader and writer that reads and writes Pig data structures directly. This could look something like readSchema, writeSchema, readJson and writeJson methods in the patch for AVRO-251.

https://issues.apache.org/jira/browse/AVRO-251

Start by looking at json.avsc (an Avro schema for arbitrary JSON data) then see the readJson() method. It directly reads Avro data corresponding to that schema into a Jackson JsonNode. ResolvingDecoder enforces the schema.

Yes, it helps a lot.  One question remains, how can I construct a recursive 
schema programmatically?
I have a couple options for the pig Tuple avro schema -- write it in JSON and 
put that in the source code or programmatically construct it.
I'm currently programmatically constructing a schema specific to the Pig schema 
that is serialized, which is straightforward until I hit the map type and 
recursion.

If you're not using a universal Pig schema then the above strategy may or may not work. It might still work if the specific schema is always a subset of the universal Pig schema, which I suspect it is.

To construct a recursive schema programmatically you need to do what the schema parser does: create a record schema with createSchema(), create it's fields, including one or more that references the record schema, then call setFields() with the fields.

Doug


Reply via email to