Schema evolution in Gora

Lewis John Mcgibbney Tue, 01 Apr 2014 07:48:12 -0700

Hi Folks,
I've ended up in a conversation [0] over on user@avro regarding Schema
evolution.
Right now our workflow is as follows


 * write .avsc schema and use GoraCompiler to generate Persistent data
beans.
 * use the Persistent class whenever we wish to read to or write from the
data.

AFAICT, as explained in [0], this presents us with a problem. Namely that
we have very sketchy support to Schema evolution over time.

We narrowly avoided minor situation over in Nutch when we added a 'batchId'
Field to our WebPage Schema as some Tools when attempting to read Field's
which were simply not present for some records.

So this thread is opened to discussion surrounding what we can/must do to
improve this.
Should we store the Schema along with the data?
Should we store a Hash of the Schema along with the data?
Should we support Schema versioning?
Should we support Schema fingerprinting?

Of course this is something for the 0.5-SNAPSHOT development drive but it
is something which we need to sort out as time goes on.

Ta
Lewis

[0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html

-- 
*Lewis*

Schema evolution in Gora

Reply via email to