Hello All, I would like to introduce you all to a project which we have been working on using Avro and get some feedback.
1. AvroGraph ------------ We have created an avro to graphml serializer / deserializer. This allows us to visualize avro schemas in a graph to understand the relation between all the data points. This will later lead to creation of lineage graphs among other things - Implementation o similar to json serializer / deserializer o Apache Tinkerpop is used as a graph library and can be used to persist to a variety of graph stores. o support for scheme evolution between multiple version of the avro schemas o lot of unit tests and documentation 2. Avro Transformation Language ------------------------------- This is YAML based specification that will transform a data in a source schema to a target schema. For this we introduce a "transform node" to join the two schemas - The following operations can be done during the source to target data transformations o Copy source leaves to target leaves o Copy source parent nodes to target parent nodes, only if the sub graphs have the same structure. o Concatenate source nodes and copy to a target node o User-defined operations on the transforms o Extract certain leaves from the source and call an external end point for data manipulation eg. Spark / Http Let me know how/if these components would benefit the apache avro project and accordingly we would like to contribute it to the apache avro project. -Ani
