[
https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105832#comment-15105832
]
Bram Biesbrouck commented on AVRO-457:
--------------------------------------
Please allow me to comment on this after having used Michael's project (from
https://github.com/mikepigott/xml-to-avro) on the official (and fairly complex)
ebucore.xsd schema version 1.6 (see https://tech.ebu.ch/MetadataEbuCore and
https://www.ebu.ch/metadata/schemas/EBUCore/ebucore.zip)
To me, from a developer point of view, the need for the tool Michael has
written is very high; nearly all official ontologies release their versions
using XML schema (XSD) files. Just like the XJC (and by extent the JAXB)
project, it's important to have de-facto standard projects to convert them to
working memory models. Having a reliable XSD->AVSC converter would be awesome.
I've played around with Michael's code and got it to successfully generate an
avro schema from the ebucore.xsd file. However, I had to make a lot of
modifications to the original file because not all standards are implemented in
xml-to-avro (for one, elements with default, empty types crash the converter).
After having tried four solutions:
1) https://github.com/stealthly/xml-avro
2) https://github.com/mikepigott/xml-to-avro
3) https://github.com/nokia/Avro-Schema-Generator
4) https://github.com/FasterXML/jackson-dataformat-avro
I conclude that solution 1 is the best for now, because it works out of the box
without modifications and generates a more type-safe schema (than Michael's
converter), although for complex schemas like ebucore, double types are
introduced (eg; Double1, Double2, ...).
All this to make a point: I, together with a lot of other developers, truly see
the need for an official XSD->AVSC converter, so please consider it. I can help
with testing, but I'm no XSD expert.
You might want to contact to folks at https://github.com/stealthly/xml-avro
bram
> add tools that read/write xml records from/to avro data files
> -------------------------------------------------------------
>
> Key: AVRO-457
> URL: https://issues.apache.org/jira/browse/AVRO-457
> Project: Avro
> Issue Type: New Feature
> Components: java
> Affects Versions: 1.7.8
> Reporter: Doug Cutting
> Labels: gsoc
> Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch,
> AVRO-457.patch
>
>
> It might be useful to have command-line tools that can read & write arbitrary
> XML data from & to Avro data files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)