[ 
https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312427#comment-14312427
 ] 

Alfonso Nishikawa edited comment on GORA-401 at 2/9/15 4:37 PM:
----------------------------------------------------------------

@[~hsaputra], AvroStore uses SpecificDatumWriter [1] when persisting (as 
[~renato2099] argued). Anyway, even using PersistentDatumWriter, _I think_ it 
writes all data as a hole in binary/JSON not just updating fields. If I am not 
wrong, what I saw is that AvroStore just writes appending to a file (unable to 
update) [2].

I think there are more issues with AvroStore, like the need to close it to get 
all data written to disk (at least I was having some troubles reading data when 
the AvroStore was not previously closed).
I don't feel comfortable about having to modify excessively the behavior of 
AvroStore to get the tests passed.

Thanks!

[1] - 
https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/avro/store/AvroStore.java#L238
[2] - 
https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/avro/store/AvroStore.java#L183


was (Author: alfonso.nishikawa):
@[~hsaputra], AvroStore uses SpecificDatumWriter [1] when persisting (as 
[~renato2099] argued). Anyway, even using PersistentDatumWriter, _I think_ it 
writes all data as a hole in binary/JSON not just updating fields. If I am not 
wrong, what I saw is that AvroStore just writes appending to a file (unable to 
update) [2].

I think there are more issues with AvroStore, like the need to close it to get 
all data written to disk (at least I was having some troubles reading data when 
the AvroStore was not previously closed).
I don't feel comfortable about having to modify excessively the behavior of 
AvroStore to get the tests passed.

[1] - 
https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/avro/store/AvroStore.java#L238
[2] - 
https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/avro/store/AvroStore.java#L183

> Serialization and deserialization of Persistent does not hold the entity 
> dirty state from Map to Reduce
> -------------------------------------------------------------------------------------------------------
>
>                 Key: GORA-401
>                 URL: https://issues.apache.org/jira/browse/GORA-401
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: gora-core
>    Affects Versions: 0.4, 0.5
>         Environment: Tested on gora-0.4, but seems logically to hold on 
> gora-0.5. HBase backend.
>            Reporter: Alfonso Nishikawa
>            Assignee: Alfonso Nishikawa
>            Priority: Critical
>              Labels: serialization
>             Fix For: 0.7
>
>         Attachments: GORA-401-tests.patch, GORA-401v1.patch, 
> GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch
>
>   Original Estimate: 35h
>          Time Spent: 21h
>  Remaining Estimate: 14h
>
> After removing __g__dirty field in GORA-326, dirty field is not serialized. 
> In GORA-321 
> {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
>  went from using 
> {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
>  to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty 
> field to Avro (but really not desirable to have that field as a main field in 
> the entities).
> The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which 
> will serialize the internal fields of the entities.
> This bug affects, for example, Nutch, which loads only some fields in it's 
> phases, serializes entities (from Map to Reduce), and when deserializes finds 
> all fields as "dirty", independently of what fields were modified in the Map, 
> and overwrite all data in datastore (deleting much things: downloaded 
> content, parsed content, etc).
> This effect can be seen in 
> {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in 
> {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections 
> shows that, entities are "equal" when it's fields are equal. This is fine as 
> "equal" definition, but another test must be added to check that 
> serialization an deserialization keeps the dirty state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to