Hi Folks, Wdyt ? We should solve this problem for stable deserialization and serialization. If we decide any solution, I can work on it. I have time.
Talat 2014-04-10 14:48 GMT+03:00 Alparslan Avcı <[email protected]>: > Hi folks, > > I also think that "schema evolution over time" is an important problem that > we should handle. Because of this, it is really hard to extend the data > schema on any application which uses Gora. We've experienced this in Nutch. > > About proposedsolutions; > > - "Should we store the Schema along with the data?"-> IMHO, we should store > the schema but we should also discuss about the way that we store. Talat's > 'recipe' can be a good option for this, and moreover; I think of storing all > field schemas separately instead of storing persistent schema in one piece. > Although storing every field schema is more complex than storing only one > big persistent schema, it will give us more extensibility and ease at > back-compatibility. And again for field schemas, we should discuss the way > of storing (serialized/not serialized?, store to where?, etc.). > > - "Should we store a Hash of the Schema along with the data? Should we > support Schema versioning? Should we support Schema fingerprinting?" -> We > can need to support schema versioning, since it may help to compare > evaluated schemas. But if we store the schema, we won't need to store the > hash, or support fingerprinting, I think. > > > Alparslan > > > > On 08-04-2014 14:57, Talat Uyarer wrote: >> >> Hi all, >> >> IMHO we can store a NEW field called "recipe of persistent" about >> written record. The Recipe field store information of which field has >> been serialized with which serializer. It is stored as a serialized >> with string serializer. Every getting datas from store It is >> deserialized. And that object of data is generated from this recipe's >> schema. The recipe field store similar with persistent's schema but it >> has some different definition and extra information about fields. For >> example in schema of persistent has a union field similar to below: >> >> {"name": "name", "type": ["null","string"],"default":null} >> >> If it is serialized by string serializer. it is written in the recipe >> field >> >> {"name": "name", "type": "string","default":null} >> >> Thus name field can be deserialized without persistent's schema. >> Another benefit: If persistent's schema is changed, we can still >> deserialize without any information. >> >> I hope I can be understandable. :) >> >> Talat >> >> 2014-04-08 12:11 GMT+03:00 Henry Saputra <[email protected]>: >>> >>> Technically it was named after a dog, hence the logo, which just happen >>> to >>> match that abbreviation :) >>> >>> On Tuesday, April 1, 2014, Renato Marroquín Mogrovejo < >>> [email protected]> wrote: >>> >>>> Hi Lewis, >>>> >>>> This is for sure a very interesting and something that GORA should deal >>>> with. >>>> It is funny that only now I found out that GORA actually means "Generic >>>> Object Representation using Avro". This means that we will always have >>>> to >>>> use Avro for everything? Never mind, we all can discuss about this when >>>> the >>>> time comes. >>>> For the little reading I did about data evolution, : >>>> - Schema along with data -> This could be done in a similar way as we >>>> are >>>> approaching the union fields i.e. append an extra field to the data with >>>> its schema, deserialize the schema, and then check if the data can >>>> actually >>>> suffice the query or not. Of course this would be part of 0.5 :) >>>> - Hash of the Schema along with the data, Schema versioning, Schema >>>> fingerprinting -> >>>> This needs some way of looking up saved schemas (versions, hashes, or >>>> schema fingerprints). >>>> >>>> >>>> Renato M. >>>> >>>> >>>> 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney >>>> <[email protected]<javascript:;> >>>>> >>>>> : >>>>> Hi Folks, >>>>> I've ended up in a conversation [0] over on user@avro regarding Schema >>>>> evolution. >>>>> Right now our workflow is as follows >>>>> >>>>> * write .avsc schema and use GoraCompiler to generate Persistent data >>>>> beans. >>>>> * use the Persistent class whenever we wish to read to or write from >>>>> the >>>>> data. >>>>> >>>>> AFAICT, as explained in [0], this presents us with a problem. Namely >>>>> that >>>>> we have very sketchy support to Schema evolution over time. >>>>> >>>>> We narrowly avoided minor situation over in Nutch when we added a >>>> >>>> 'batchId' >>>>> >>>>> Field to our WebPage Schema as some Tools when attempting to read >>>>> Field's >>>>> which were simply not present for some records. >>>>> >>>>> So this thread is opened to discussion surrounding what we can/must do >>>>> to >>>>> improve this. >>>>> Should we store the Schema along with the data? >>>>> Should we store a Hash of the Schema along with the data? >>>>> Should we support Schema versioning? >>>>> Should we support Schema fingerprinting? >>>>> >>>>> Of course this is something for the 0.5-SNAPSHOT development drive but >>>>> it >>>>> is something which we need to sort out as time goes on. >>>>> >>>>> Ta >>>>> Lewis >>>>> >>>>> [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html >>>>> >>>>> -- >>>>> *Lewis* >>>>> >> >> > -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

