[ 
https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105832#comment-15105832
 ] 

Bram Biesbrouck commented on AVRO-457:
--------------------------------------

Please allow me to comment on this after having used Michael's project (from 
https://github.com/mikepigott/xml-to-avro) on the official (and fairly complex) 
ebucore.xsd schema version 1.6 (see https://tech.ebu.ch/MetadataEbuCore and 
https://www.ebu.ch/metadata/schemas/EBUCore/ebucore.zip)

To me, from a developer point of view, the need for the tool Michael has 
written is very high; nearly all official ontologies release their versions 
using XML schema (XSD) files. Just like the XJC (and by extent the JAXB) 
project, it's important to have de-facto standard projects to convert them to 
working memory models. Having a reliable XSD->AVSC converter would be awesome.

I've played around with Michael's code and got it to successfully generate an 
avro schema from the ebucore.xsd file. However, I had to make a lot of 
modifications to the original file because not all standards are implemented in 
xml-to-avro (for one, elements with default, empty types crash the converter).

After having tried four solutions:
1) https://github.com/stealthly/xml-avro
2) https://github.com/mikepigott/xml-to-avro
3) https://github.com/nokia/Avro-Schema-Generator
4) https://github.com/FasterXML/jackson-dataformat-avro

I conclude that solution 1 is the best for now, because it works out of the box 
without modifications and generates a more type-safe schema (than Michael's 
converter), although for complex schemas like ebucore, double types are 
introduced (eg; Double1, Double2, ...).

All this to make a point: I, together with a lot of other developers, truly see 
the need for an official XSD->AVSC converter, so please consider it. I can help 
with testing, but I'm no XSD expert. 
You might want to contact to folks at https://github.com/stealthly/xml-avro

bram

> add tools that read/write xml records from/to avro data files
> -------------------------------------------------------------
>
>                 Key: AVRO-457
>                 URL: https://issues.apache.org/jira/browse/AVRO-457
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.8
>            Reporter: Doug Cutting
>              Labels: gsoc
>         Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch, 
> AVRO-457.patch
>
>
> It might be useful to have command-line tools that can read & write arbitrary 
> XML data from & to Avro data files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to