[
https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108297#comment-15108297
]
Bram Biesbrouck commented on AVRO-457:
--------------------------------------
Hi [~rdblue] and [~mpigott],
I think I might have found a better approach to this...
To parse XSD schemas, 99% of Java users use
[XJC|https://jaxb.java.net/2.2.4/docs/xjc.html] to convert an XSD to POJOs. The
results of this tool are very good, since it's a mature tool.
Because it makes sense to reuse a common POJO codebase to (de)serialize to
JSON/XML/AVRO, this might be a better start to investigate a robust XSD->AVRO
parser. Also because raw XSD parsing/understanding is quite error prone.
Fortunately, a lot of work has been done already. Take a look at [this
project|https://github.com/fge/json-schema-core].
It generates a JSON Schema from a POJO class (and recursively all it's
members). The result is a [JSON schema|http://json-schema.org/].
Now the best part: the same developers also wrote [this
project|https://github.com/fge/json-schema-avro] that converts a JSON schema to
an AVRO schema. However, the json->avro converter is not production ready yet.
But it has a very nice codebase to start with. [This
class|https://github.com/fge/json-schema-avro/blob/master/src/main/java/com/github/fge/jsonschema2avro/AvroWriterProcessor.java]
is a good entry point to its inner workings.
I'm currently trying to find some time to work on it, but it's slow. I
successfully managed to convert the EBUCore XSD schema to a JSON schema though.
The next step (JSON->AVRO) is more difficult I'm afraid. Hence: do the AVRO
developers have any experience with converting JSON schemas into (the more
narrow) AVRO schema structure? Would be interesting to investigate in general
because JSON validation is becoming more and more relevant these days.
b.
> add tools that read/write xml records from/to avro data files
> -------------------------------------------------------------
>
> Key: AVRO-457
> URL: https://issues.apache.org/jira/browse/AVRO-457
> Project: Avro
> Issue Type: New Feature
> Components: java
> Affects Versions: 1.7.8
> Reporter: Doug Cutting
> Labels: gsoc
> Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch,
> AVRO-457.patch, ebucore.json
>
>
> It might be useful to have command-line tools that can read & write arbitrary
> XML data from & to Avro data files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)