Re: AvroStorage pig adapter

Doug Cutting Fri, 14 May 2010 12:18:28 -0700

On 05/14/2010 11:50 AM, Scott Carey wrote:

It doesn't cost on the serialization size but currently it costs a lot on the 
performance side.

Not necessarily. For Pig data in Java I think one might reasonably writea custom reader and writer that reads and writes Pig data structuresdirectly. This could look something like readSchema, writeSchema,readJson and writeJson methods in the patch for AVRO-251.


https://issues.apache.org/jira/browse/AVRO-251

Start by looking at json.avsc (an Avro schema for arbitrary JSON data)then see the readJson() method. It directly reads Avro datacorresponding to that schema into a Jackson JsonNode. ResolvingDecoderenforces the schema.

Yes, it helps a lot.  One question remains, how can I construct a recursive 
schema programmatically?
I have a couple options for the pig Tuple avro schema -- write it in JSON and 
put that in the source code or programmatically construct it.
I'm currently programmatically constructing a schema specific to the Pig schema 
that is serialized, which is straightforward until I hit the map type and 
recursion.

If you're not using a universal Pig schema then the above strategy mayor may not work. It might still work if the specific schema is always asubset of the universal Pig schema, which I suspect it is.

To construct a recursive schema programmatically you need to do what theschema parser does: create a record schema with createSchema(), createit's fields, including one or more that references the record schema,then call setFields() with the fields.


Doug

Re: AvroStorage pig adapter

Reply via email to