Alfonso Nishikawa created GORA-401:
--------------------------------------
Summary: Serialization and deserialization of Persistent does not
hold the entity dirty state
Key: GORA-401
URL: https://issues.apache.org/jira/browse/GORA-401
Project: Apache Gora
Issue Type: Bug
Components: gora-core
Affects Versions: 0.5, 0.4
Environment: Tested on gora-0.4, but seems logically to hold on
gora-0.5
Reporter: Alfonso Nishikawa
Priority: Critical
After removing __g__dirty field in GORA-326, dirty field is not serialized. In
GORA-321
{{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
went from using
{{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty
field to Avro (but really not desirable to have that field as a main field in
the entities).
The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} wich will
serialize the internal fields of the entities.
This bug affects, for example, Nutch, which loads only some fields in it's
phases, serializes entities (from Map to Reduce), and when deserializes finds
all fields as "dirty", independently of what fields were modified in the Map,
and overwrite all data in datastore (deleting much things: downloaded content,
parsed content, etc).
This effect can be seen in
{{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in
{{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections
shows that, entities are "equal" when it's fields are equal. This is fine as
"equal" definition, but another test must be added to check that serialization
an deserialization keeps the dirty state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)