[ 
https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108297#comment-15108297
 ] 

Bram Biesbrouck commented on AVRO-457:
--------------------------------------

Hi [~rdblue] and [~mpigott],

I think I might have found a better approach to this...
To parse XSD schemas, 99% of Java users use 
[XJC|https://jaxb.java.net/2.2.4/docs/xjc.html] to convert an XSD to POJOs. The 
results of this tool are very good, since it's a mature tool.
Because it makes sense to reuse a common POJO codebase to (de)serialize to 
JSON/XML/AVRO, this might be a better start to investigate a robust XSD->AVRO 
parser. Also because raw XSD parsing/understanding is quite error prone.

Fortunately, a lot of work has been done already. Take a look at [this 
project|https://github.com/fge/json-schema-core].
It generates a JSON Schema from a POJO class (and recursively all it's 
members). The result is a [JSON schema|http://json-schema.org/].
Now the best part: the same developers also wrote [this 
project|https://github.com/fge/json-schema-avro] that converts a JSON schema to 
an AVRO schema. However, the json->avro converter is not production ready yet. 
But it has a very nice codebase to start with. [This 
class|https://github.com/fge/json-schema-avro/blob/master/src/main/java/com/github/fge/jsonschema2avro/AvroWriterProcessor.java]
 is a good entry point to its inner workings.

I'm currently trying to find some time to work on it, but it's slow. I 
successfully managed to convert the EBUCore XSD schema to a JSON schema though. 
The next step (JSON->AVRO) is more difficult I'm afraid. Hence: do the AVRO 
developers have any experience with converting JSON schemas into (the more 
narrow) AVRO schema structure? Would be interesting to investigate in general 
because JSON validation is becoming more and more relevant these days.

b.

> add tools that read/write xml records from/to avro data files
> -------------------------------------------------------------
>
>                 Key: AVRO-457
>                 URL: https://issues.apache.org/jira/browse/AVRO-457
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.8
>            Reporter: Doug Cutting
>              Labels: gsoc
>         Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch, 
> AVRO-457.patch, ebucore.json
>
>
> It might be useful to have command-line tools that can read & write arbitrary 
> XML data from & to Avro data files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to