+1 to start, but at this point there is no solution yet so looks like it is open for solution proposal.
- Henry On Tue, Jul 22, 2014 at 9:43 AM, Talat Uyarer <[email protected]> wrote: > Hi Folks, > > Wdyt ? We should solve this problem for stable deserialization and > serialization. If we decide any solution, I can work on it. I have > time. > > Talat > > 2014-04-10 14:48 GMT+03:00 Alparslan Avcı <[email protected]>: >> Hi folks, >> >> I also think that "schema evolution over time" is an important problem that >> we should handle. Because of this, it is really hard to extend the data >> schema on any application which uses Gora. We've experienced this in Nutch. >> >> About proposedsolutions; >> >> - "Should we store the Schema along with the data?"-> IMHO, we should store >> the schema but we should also discuss about the way that we store. Talat's >> 'recipe' can be a good option for this, and moreover; I think of storing all >> field schemas separately instead of storing persistent schema in one piece. >> Although storing every field schema is more complex than storing only one >> big persistent schema, it will give us more extensibility and ease at >> back-compatibility. And again for field schemas, we should discuss the way >> of storing (serialized/not serialized?, store to where?, etc.). >> >> - "Should we store a Hash of the Schema along with the data? Should we >> support Schema versioning? Should we support Schema fingerprinting?" -> We >> can need to support schema versioning, since it may help to compare >> evaluated schemas. But if we store the schema, we won't need to store the >> hash, or support fingerprinting, I think. >> >> >> Alparslan >> >> >> >> On 08-04-2014 14:57, Talat Uyarer wrote: >>> >>> Hi all, >>> >>> IMHO we can store a NEW field called "recipe of persistent" about >>> written record. The Recipe field store information of which field has >>> been serialized with which serializer. It is stored as a serialized >>> with string serializer. Every getting datas from store It is >>> deserialized. And that object of data is generated from this recipe's >>> schema. The recipe field store similar with persistent's schema but it >>> has some different definition and extra information about fields. For >>> example in schema of persistent has a union field similar to below: >>> >>> {"name": "name", "type": ["null","string"],"default":null} >>> >>> If it is serialized by string serializer. it is written in the recipe >>> field >>> >>> {"name": "name", "type": "string","default":null} >>> >>> Thus name field can be deserialized without persistent's schema. >>> Another benefit: If persistent's schema is changed, we can still >>> deserialize without any information. >>> >>> I hope I can be understandable. :) >>> >>> Talat >>> >>> 2014-04-08 12:11 GMT+03:00 Henry Saputra <[email protected]>: >>>> >>>> Technically it was named after a dog, hence the logo, which just happen >>>> to >>>> match that abbreviation :) >>>> >>>> On Tuesday, April 1, 2014, Renato Marroquín Mogrovejo < >>>> [email protected]> wrote: >>>> >>>>> Hi Lewis, >>>>> >>>>> This is for sure a very interesting and something that GORA should deal >>>>> with. >>>>> It is funny that only now I found out that GORA actually means "Generic >>>>> Object Representation using Avro". This means that we will always have >>>>> to >>>>> use Avro for everything? Never mind, we all can discuss about this when >>>>> the >>>>> time comes. >>>>> For the little reading I did about data evolution, : >>>>> - Schema along with data -> This could be done in a similar way as we >>>>> are >>>>> approaching the union fields i.e. append an extra field to the data with >>>>> its schema, deserialize the schema, and then check if the data can >>>>> actually >>>>> suffice the query or not. Of course this would be part of 0.5 :) >>>>> - Hash of the Schema along with the data, Schema versioning, Schema >>>>> fingerprinting -> >>>>> This needs some way of looking up saved schemas (versions, hashes, or >>>>> schema fingerprints). >>>>> >>>>> >>>>> Renato M. >>>>> >>>>> >>>>> 2014-04-01 16:47 GMT+02:00 Lewis John Mcgibbney >>>>> <[email protected]<javascript:;> >>>>>> >>>>>> : >>>>>> Hi Folks, >>>>>> I've ended up in a conversation [0] over on user@avro regarding Schema >>>>>> evolution. >>>>>> Right now our workflow is as follows >>>>>> >>>>>> * write .avsc schema and use GoraCompiler to generate Persistent data >>>>>> beans. >>>>>> * use the Persistent class whenever we wish to read to or write from >>>>>> the >>>>>> data. >>>>>> >>>>>> AFAICT, as explained in [0], this presents us with a problem. Namely >>>>>> that >>>>>> we have very sketchy support to Schema evolution over time. >>>>>> >>>>>> We narrowly avoided minor situation over in Nutch when we added a >>>>> >>>>> 'batchId' >>>>>> >>>>>> Field to our WebPage Schema as some Tools when attempting to read >>>>>> Field's >>>>>> which were simply not present for some records. >>>>>> >>>>>> So this thread is opened to discussion surrounding what we can/must do >>>>>> to >>>>>> improve this. >>>>>> Should we store the Schema along with the data? >>>>>> Should we store a Hash of the Schema along with the data? >>>>>> Should we support Schema versioning? >>>>>> Should we support Schema fingerprinting? >>>>>> >>>>>> Of course this is something for the 0.5-SNAPSHOT development drive but >>>>>> it >>>>>> is something which we need to sort out as time goes on. >>>>>> >>>>>> Ta >>>>>> Lewis >>>>>> >>>>>> [0] http://www.mail-archive.com/user%40avro.apache.org/msg02748.html >>>>>> >>>>>> -- >>>>>> *Lewis* >>>>>> >>> >>> >> > > > > -- > Talat UYARER > Websitesi: http://talat.uyarer.com > Twitter: http://twitter.com/talatuyarer > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

