[
https://issues.apache.org/jira/browse/AVRO-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728382#comment-13728382
]
Marshall Bockrath-Vandegrift commented on AVRO-1353:
----------------------------------------------------
I noticed the trunk GenericData and my interface were similar, but there were a
few issues which kept me creating a separate interface for this patch:
(1) GenericData only has a single-schema version of createReader(), not the
separate writer & reader schemas version needed. This could be fixed by adding
another method to the class, or just calling setSchema() on the new instances
as needed.
(2) In order to match the existing implementation, under the reflect data model
the DatumReader needs to use the class loader of the job configuration. The
obvious solution was to make the class Configurable, but that means it needs to
be a new Hadoop-related class and not just a class from the base Avro package.
This could be solved by having a new e.g. HadoopReflectData
(ConfiguredReflectData?) class, with some complications below.
(3) GenericData seems to be an implementation detail, and tying the data model
to the GenericData subclass seems to intertwine interface with implementation.
For example, my current implementation for Clojure data structures doesn't
include a sub-class of GenericData. For another example, each sub-class of
GenericData needs to correctly override the creatDatumReader() etc methods to
return instances of the correct classes; any existing subclasses which don't
override the new methods will silently produce incorrect results at runtime. I
think your latter proposal is the way to fix this long term. Short term -- if
you're okay with the initial implementation using GenericData directly, I'm not
going to argue with getting a feature I need sooner, but I'm also not excited
about changing code again later to implement a new interface.
If the short-term fixes I mention in the above points seem acceptable, I'll
work up another version of the patch.
> Configurable Hadoop serialization in-memory representations
> -----------------------------------------------------------
>
> Key: AVRO-1353
> URL: https://issues.apache.org/jira/browse/AVRO-1353
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.7.5
> Reporter: Marshall Bockrath-Vandegrift
> Attachments: avro-1353-2.patch, avro-1353.patch
>
>
> As discussed on the Avro Users mailing list [1], it would be useful to allow
> configuration of the DatumReader/Writer implementations used by Hadoop Avro
> serialization, especially for non-Java JVM languages.
> [1]
> http://mail-archives.apache.org/mod_mbox/avro-user/201305.mbox/%3C87ehdh14qn.fsf%40zeno.atl.damballa%3E
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira