Better schema resolution
------------------------
Key: AVRO-762
URL: https://issues.apache.org/jira/browse/AVRO-762
Project: Avro
Issue Type: New Feature
Components: c
Reporter: Douglas Creager
Assignee: Douglas Creager
Attachments: 0001-Better-schema-resolution.patch
I've been working on a pretty major patch that changes the way the C library
implements schema resolution. Before, we would compare the writer and reader
schemas each time we try to read a record from an Avro file. This is a fair
bit of wasted effort. The approach I'm taking with the new implementation is
to separate schema resolution and binary parsing into separate operations.
There's a new "consumer" API, which defines a set of callbacks for processing
Avro data that conforms to a schema. The new avro_consume_binary function
reads binary-encoded Avro data from a buffer or file, and passes that data into
a consumer instance. Each consumer instance is associated with the writer
schema of the data that it expects to process.
Schema resolution is now implemented in the new avro_resolver_new function,
which returns a consumer instance that knows how to translate from the writer
schema to the reader schema. As the resolver receives data via the consumer
API, it fills in the contents of a destination avro_datum_t (which should be an
instance of the reader schema).
This work isn't complete yet — I still have to implement promotion (int->long
and friends), and have to add support for recursive schemas (via the AVRO_LINK
schema type). But I wanted to get the patch out there for people to view and
test in the meantime. This patch depends on a few other of my patches, that
haven't made it into SVN yet; if you want to test the code without patching
yourself, I have a tracking branch on
[github|https://github.com/dcreager/avro/tree/resolution].
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira