[
https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061520#comment-14061520
]
Michael Pigott commented on AVRO-457:
-------------------------------------
Thanks for the insight! I have modified the proposal accordingly. If we have
the URL to the XML Schema, we can encode that in the Avro schema. If we don't,
your recommendation makes a lot of sense. It is a bit more complicated for
complex XML types, as all, choice, and sequence groups may contain more groups
internally.
I propose to store group metadata as JSON objects, each of which with a "type"
field containing the child type: “all,” “choice,” “sequence,” or “element.”
Other fields define the minimum and maximum number of occurrences, and a value
field. For groups, the “value” field is an array of the members of that group.
For elements, the “value” field is the element’s fully-qualified XML name.
Here is an example:
{code}
{ "type": "sequence",
"minOccurs": 0,
"maxOccurs": "unbounded",
"value": [
{ "type": "element",
"minOccurs": 1,
"maxOccurs": 1,
"value": { "namespace": "http://www.w3.org/2001/XMLSchema",
"localPart": "complexType"
}
}
]
}
{code}
This isn't a perfect solution - attributes, elements, groups, and types can be
abstracted to separate sections of an XML Schema for reusability across the
document. In addition, multiple schemas can be referenced when describing an
XML document. I think the only true way to support lossless Avro Schema -> XML
Schema conversion would be to encode the entire XML Schema in JSON in the Avro
schema. That said, the updated proposal will allow us to create an XML Schema
that validates the same documents that the original schema would, so I think it
is a reasonable compromise.
> add tools that read/write xml records from/to avro data files
> -------------------------------------------------------------
>
> Key: AVRO-457
> URL: https://issues.apache.org/jira/browse/AVRO-457
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Doug Cutting
> Labels: gsoc
>
> It might be useful to have command-line tools that can read & write arbitrary
> XML data from & to Avro data files.
--
This message was sent by Atlassian JIRA
(v6.2#6252)