Re: Schema evolution in Gora

Talat Uyarer Tue, 08 Apr 2014 04:59:02 -0700

Hi all,

IMHO we can store a NEW field called "recipe of persistent" about
written record. The Recipe field store information of which field has
been serialized with which serializer. It is stored as a serialized
with string serializer. Every getting datas from store It is
deserialized. And that object of data is generated from this recipe's
schema. The recipe field store similar with persistent's schema but it
has some different definition and extra information about fields. For
example in schema of persistent has a union field similar to below:


{"name": "name", "type": ["null","string"],"default":null}

If it is serialized by string serializer. it is written in the recipe field

{"name": "name", "type": "string","default":null}

Thus name field can be deserialized without persistent's schema.
Another benefit: If persistent's schema is changed, we can still
deserialize without any information.

I hope I can be understandable. :)

Talat

2014-04-08 12:11 GMT+03:00 Henry Saputra <[email protected]>:
> Technically it was named after a dog, hence the logo, which just happen to
> match that abbreviation :)
>
> On Tuesday, April 1, 2014, Renato Marroquín Mogrovejo <
> [email protected]> wrote:
>
>> Hi Lewis,
>>
>> This is for sure a very interesting and something that GORA should deal
>> with.
>> It is funny that only now I found out that GORA actually means "Generic
>> Object Representation using Avro". This means that we will always have to
>> use Avro for everything? Never mind, we all can discuss about this when the
>> time comes.
>> For the little reading I did about data evolution,  :
>> - Schema along with data -> This could be done in a similar way as we are
>> approaching the union fields i.e. append an extra field to the data with
>> its schema, deserialize the schema, and then check if the data can actually
>> suffice the query or not. Of course this would be part of 0.5 :)
>> - Hash of the Schema along with the data, Schema versioning, Schema
>> fingerprinting ->
>> This needs some way of looking up saved schemas (versions, hashes, or
>> schema fingerprints).
>>
>>
>> Renato M.
>>
>>
>> 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney 
>> <[email protected]<javascript:;>
>> >:
>>
>> > Hi Folks,
>> > I've ended up in a conversation [0] over on user@avro regarding Schema
>> > evolution.
>> > Right now our workflow is as follows
>> >
>> >  * write .avsc schema and use GoraCompiler to generate Persistent data
>> > beans.
>> >  * use the Persistent class whenever we wish to read to or write from the
>> > data.
>> >
>> > AFAICT, as explained in [0], this presents us with a problem. Namely that
>> > we have very sketchy support to Schema evolution over time.
>> >
>> > We narrowly avoided minor situation over in Nutch when we added a
>> 'batchId'
>> > Field to our WebPage Schema as some Tools when attempting to read Field's
>> > which were simply not present for some records.
>> >
>> > So this thread is opened to discussion surrounding what we can/must do to
>> > improve this.
>> > Should we store the Schema along with the data?
>> > Should we store a Hash of the Schema along with the data?
>> > Should we support Schema versioning?
>> > Should we support Schema fingerprinting?
>> >
>> > Of course this is something for the 0.5-SNAPSHOT development drive but it
>> > is something which we need to sort out as time goes on.
>> >
>> > Ta
>> > Lewis
>> >
>> > [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html
>> >
>> > --
>> > *Lewis*
>> >
>>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Re: Schema evolution in Gora

Reply via email to