[
https://issues.apache.org/jira/browse/AVRO-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240167#comment-14240167
]
Aaron Kimball commented on AVRO-1618:
-------------------------------------
Yes, that's correct.
I think it depends what you mean about making the parsing process "real."
Converting a single JSON-encoded datum into a record is an (almost) trivial
process; effectively:
{code}
class JsonDecoder:
def decode(self, json_string):
return json.loads(json_string)
{code}
... thus why I don't think such a method has been provided already :)
Converting a stream of concatenated json data strings into json objects via the
{{DatumReader}} interface is a much bigger/harder patch to write for for three
reasons:
* The trivial json decoder outlined above does not perform schema resolution;
that is done (I think?) in the DatumReader layer.
* {{DatumReader}} and {{BinaryDecoder}} are specialized to one another;
refactoring of the BinaryDecoder API and DatumReader implementation would be
required. This code is not particularly well-known to me and would require some
time to familiarize myself with it. DatumReader was not written to use a
generic "Decoder" interface (e.g., the DatumReader specifically calls methods
with names like {{decodeInt}} to establish the type of a union).
* Python's built in json library and {{simplejson}} don't seem particularly
well-inclined toward a token-stream-based approach to JSON parsing; they seems
to want to munch whole strings into complete output objects. I think we'd have
to learn and depend on a new library like ijson
(https://pypi.python.org/pypi/ijson/) to make this happen...
> Allow user to "clean up" unions into more conventional dicts in json encoding
> -----------------------------------------------------------------------------
>
> Key: AVRO-1618
> URL: https://issues.apache.org/jira/browse/AVRO-1618
> Project: Avro
> Issue Type: Improvement
> Components: python
> Affects Versions: 1.7.7
> Reporter: Aaron Kimball
> Assignee: Aaron Kimball
> Attachments: avro-1618.1.patch
>
>
> In Avro's JSON encoding, unions are implemented in a tagged fashion; walking
> through this data structure is somewhat cumbersome. It would be good to have
> a way of "decoding" this tagged-union data structure into a more conventional
> dict where the union element is directly present without the tag.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)