[ https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312427#comment-14312427 ]
Alfonso Nishikawa edited comment on GORA-401 at 2/9/15 4:37 PM: ---------------------------------------------------------------- @[~hsaputra], AvroStore uses SpecificDatumWriter [1] when persisting (as [~renato2099] argued). Anyway, even using PersistentDatumWriter, _I think_ it writes all data as a hole in binary/JSON not just updating fields. If I am not wrong, what I saw is that AvroStore just writes appending to a file (unable to update) [2]. I think there are more issues with AvroStore, like the need to close it to get all data written to disk (at least I was having some troubles reading data when the AvroStore was not previously closed). I don't feel comfortable about having to modify excessively the behavior of AvroStore to get the tests passed. Thanks! [1] - https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/avro/store/AvroStore.java#L238 [2] - https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/avro/store/AvroStore.java#L183 was (Author: alfonso.nishikawa): @[~hsaputra], AvroStore uses SpecificDatumWriter [1] when persisting (as [~renato2099] argued). Anyway, even using PersistentDatumWriter, _I think_ it writes all data as a hole in binary/JSON not just updating fields. If I am not wrong, what I saw is that AvroStore just writes appending to a file (unable to update) [2]. I think there are more issues with AvroStore, like the need to close it to get all data written to disk (at least I was having some troubles reading data when the AvroStore was not previously closed). I don't feel comfortable about having to modify excessively the behavior of AvroStore to get the tests passed. [1] - https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/avro/store/AvroStore.java#L238 [2] - https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/avro/store/AvroStore.java#L183 > Serialization and deserialization of Persistent does not hold the entity > dirty state from Map to Reduce > ------------------------------------------------------------------------------------------------------- > > Key: GORA-401 > URL: https://issues.apache.org/jira/browse/GORA-401 > Project: Apache Gora > Issue Type: Bug > Components: gora-core > Affects Versions: 0.4, 0.5 > Environment: Tested on gora-0.4, but seems logically to hold on > gora-0.5. HBase backend. > Reporter: Alfonso Nishikawa > Assignee: Alfonso Nishikawa > Priority: Critical > Labels: serialization > Fix For: 0.7 > > Attachments: GORA-401-tests.patch, GORA-401v1.patch, > GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch > > Original Estimate: 35h > Time Spent: 21h > Remaining Estimate: 14h > > After removing __g__dirty field in GORA-326, dirty field is not serialized. > In GORA-321 > {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}} > went from using > {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}} > to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty > field to Avro (but really not desirable to have that field as a main field in > the entities). > The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which > will serialize the internal fields of the entities. > This bug affects, for example, Nutch, which loads only some fields in it's > phases, serializes entities (from Map to Reduce), and when deserializes finds > all fields as "dirty", independently of what fields were modified in the Map, > and overwrite all data in datastore (deleting much things: downloaded > content, parsed content, etc). > This effect can be seen in > {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in > {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections > shows that, entities are "equal" when it's fields are equal. This is fine as > "equal" definition, but another test must be added to check that > serialization an deserialization keeps the dirty state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)