Re: Schema evolution in Gora

Henry Saputra Tue, 08 Apr 2014 02:13:25 -0700

Technically it was named after a dog, hence the logo, which just happen to
match that abbreviation :)


On Tuesday, April 1, 2014, Renato Marroquín Mogrovejo <
[email protected]> wrote:

> Hi Lewis,
>
> This is for sure a very interesting and something that GORA should deal
> with.
> It is funny that only now I found out that GORA actually means "Generic
> Object Representation using Avro". This means that we will always have to
> use Avro for everything? Never mind, we all can discuss about this when the
> time comes.
> For the little reading I did about data evolution,  :
> - Schema along with data -> This could be done in a similar way as we are
> approaching the union fields i.e. append an extra field to the data with
> its schema, deserialize the schema, and then check if the data can actually
> suffice the query or not. Of course this would be part of 0.5 :)
> - Hash of the Schema along with the data, Schema versioning, Schema
> fingerprinting ->
> This needs some way of looking up saved schemas (versions, hashes, or
> schema fingerprints).
>
>
> Renato M.
>
>
> 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney 
> <[email protected]<javascript:;>
> >:
>
> > Hi Folks,
> > I've ended up in a conversation [0] over on user@avro regarding Schema
> > evolution.
> > Right now our workflow is as follows
> >
> >  * write .avsc schema and use GoraCompiler to generate Persistent data
> > beans.
> >  * use the Persistent class whenever we wish to read to or write from the
> > data.
> >
> > AFAICT, as explained in [0], this presents us with a problem. Namely that
> > we have very sketchy support to Schema evolution over time.
> >
> > We narrowly avoided minor situation over in Nutch when we added a
> 'batchId'
> > Field to our WebPage Schema as some Tools when attempting to read Field's
> > which were simply not present for some records.
> >
> > So this thread is opened to discussion surrounding what we can/must do to
> > improve this.
> > Should we store the Schema along with the data?
> > Should we store a Hash of the Schema along with the data?
> > Should we support Schema versioning?
> > Should we support Schema fingerprinting?
> >
> > Of course this is something for the 0.5-SNAPSHOT development drive but it
> > is something which we need to sort out as time goes on.
> >
> > Ta
> > Lewis
> >
> > [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html
> >
> > --
> > *Lewis*
> >
>

Re: Schema evolution in Gora

Reply via email to