[
https://issues.apache.org/jira/browse/AVRO-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068850#comment-14068850
]
Doug Cutting commented on AVRO-695:
-----------------------------------
I realize this is what you've implemented, however I think the implementation
can be improved.
I would prefer that the circular reference not be a string, but rather be a
unique type, e.g., a record named CircularRef that contains an integer id field.
I agree that all object references in a schema need to be changed to unions
containing possible circular references. However I don't see that a target id
field needs to be added to all records. Rather the target id can be implicit,
based on the order that sub-objects are written, as follows.
In each call to DatumWriter#write, an IdentityHashMap<Object,Integer> table is
created. In #resolveUnion, when a union containing a record and a CircularRef
is passed a record, it looks for the record in the table. If the record
already exists in the table, then the CircularRef branch is returned, writing a
reference containing the integer value from the table. If the record does not
exist in the table, then an entry for it is added to the table whose value is
the current size of the table, and the record branch of the union is returned
and written.
Then, each call to DatumReader#read creates a Vector<Object>. If #readRecord
reads a CircularRef, then it returns the item indicated by its id in the
vector, otherwise it adds a reference to each record read to the vector.
I believe this can work, is more strongly-typed, and avoids adding new fields
to schemas, only inserting unions for record references.
> Cycle Reference Support
> -----------------------
>
> Key: AVRO-695
> URL: https://issues.apache.org/jira/browse/AVRO-695
> Project: Avro
> Issue Type: New Feature
> Components: spec
> Affects Versions: 1.7.6
> Reporter: Moustapha Cherri
> Attachments: avro-1.4.1-cycle.patch.gz, avro-1.4.1-cycle.patch.gz,
> avro_circular_references.zip, avro_circular_refs_2014_06_14.zip,
> circular_refs_and_nonstring_map_keys_2014_06_25.zip
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> This is a proposed implementation to add cycle reference support to Avro. It
> basically introduce a new type named Cycle. Cycles contains a string
> representing the path to the other reference.
> For example if we have an object of type Message that have a member named
> previous with type Message too. If we have have this hierarchy:
> message
> previous : message2
> message2
> previous : message2
> When serializing the cycle path for "message2.previous" will be "previous".
> The implementation depend on ANTLR to evaluate those cycle at read time to
> resolve them. I used ANTLR 3.2. This dependency is not mandated; I just used
> ANTLR to speed thing up. I kept in this implementation the generated code
> from ANTLR though this should not be the case as this should be generated
> during the build. I only updated the Java code.
> I did not make full unit testing but you can find "avrotest.Main" class that
> can be used a preliminary test.
> Please do not hesitate to contact me for further clarification if this seems
> interresting.
> Best regards,
> Moustapha Cherri
--
This message was sent by Atlassian JIRA
(v6.2#6252)