[
https://issues.apache.org/jira/browse/AVRO-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339950#comment-17339950
]
Feroze Daud commented on AVRO-3026:
-----------------------------------
Hi!
Finally had a chance to try it out. So the `@tags` annotation works. But there
doesnt seem to be a way to add a `doc` tag as allowed by AVSC schema to the
output file.
How do we achieve that?
For eg, this causes an error...
{noformat}
record employee {
string @doc("employee name") name;
boolean active;
long salary;
}
{noformat}
Output:
{noformat}
$ java -jar ~/DevTools/avro-tools-1.10.1.jar idl employee.avdl | jq .$ java
-jar ~/DevTools/avro-tools-1.10.1.jar idl employee.avdl | jq .Exception in
thread "main" org.apache.avro.AvroRuntimeException: Can't set reserved
property: doc at
org.apache.avro.JsonProperties.addProp(JsonProperties.java:281) at
org.apache.avro.JsonProperties.access$000(JsonProperties.java:121) at
org.apache.avro.JsonProperties$1.addProp(JsonProperties.java:127) at
org.apache.avro.util.internal.Accessor.addProp(Accessor.java:101) at
org.apache.avro.compiler.idl.Idl.VariableDeclarator(Idl.java:664) at
org.apache.avro.compiler.idl.Idl.FieldDeclaration(Idl.java:606) at
org.apache.avro.compiler.idl.Idl.RecordDeclaration(Idl.java:569) at
org.apache.avro.compiler.idl.Idl.NamedSchemaDeclaration(Idl.java:153) at
org.apache.avro.compiler.idl.Idl.ProtocolBody(Idl.java:402) at
org.apache.avro.compiler.idl.Idl.ProtocolDeclaration(Idl.java:227) at
org.apache.avro.compiler.idl.Idl.CompilationUnit(Idl.java:117) at
org.apache.avro.tool.IdlTool.run(IdlTool.java:61) at
org.apache.avro.tool.Main.run(Main.java:67) at
org.apache.avro.tool.Main.main(Main.java:56) {noformat}
> Allow custom annotations in IDL files and support translating them to AVSC
> Avro.
> --------------------------------------------------------------------------------
>
> Key: AVRO-3026
> URL: https://issues.apache.org/jira/browse/AVRO-3026
> Project: Apache Avro
> Issue Type: New Feature
> Components: spec
> Affects Versions: 1.9.0, 1.9.1, 1.9.2, 1.10.1
> Reporter: Feroze Daud
> Priority: Major
>
> h2. Introduction
> Our company has standardized on Avro schemas for all Data intestion and
> storage. As part of this, and to satisfy CCPA, we need to be able to tag the
> records and fields appropriately if they have PI, or Non PI information, etc.
> Avro AVSC files, being valid json, can easily be modified to add tags that
> will be used by downstream processors, and also wont interfere with Avro
> itself ( to generate POJO, serialization, deserialization etc)
> One such key we chose is simply called *tags*. It's example usage is shown
> below.
> {code:java}
> {
> "type": "record",
> "name": "PropertyOwner",
> "namespace": "com.acme.Property",
> "tags": ["PI", "PII" ],
> "fields": [
> {
> "name": "FullName",
> "type": "string",
> "tags": ["Name"]
> },
> {
> "name": "PhoneNumber",
> "type": "string",
> "tags": ["Phone"]
> }],
> }{code}
>
> These tags can be processed by downstream processors and the data landing in
> datalake, or database can be tagged appropriately.
>
> h2. Problem Description
> While tagging will work fine for AVSC because adding extra fields doesnt make
> it invalid, we will have a problem when using IDL to author schemas. IDL spec
> does not allow a way to add extra tags that are copied over to the Avro
> schema.
>
> h2. Proposal
> I propose that we allow a special *@annotation* tag . And, this tag can be
> applied to records and fields. Whatever is in this annotation should be
> copied verbatim to the output AVSC.
> For eg:
> {code:java}
> @annotation("tags", "[\"PI\", \"Non PI\"]"
> record Employee {
> @annotation("tags", "[\"Name\"]"
> string fullName;
> boolean active = true;
> long salary;
> @annotation("tags", "[\"Phone\"]"
> string phone;
> } {code}
>
> would generate an avro schema as folllows:
>
> {code:java}
> {
> "type": "record",
> "name": "Employee",
> "tags": ["PI", "PII" ],
> "fields": [
> {
> "name": "FullName",
> "type": "string",
> "tags": ["Name"]
> },
> {
> "name": "PhoneNumber",
> "type": "string",
> "tags": ["Phone"]
> }],
> }{code}
>
> As you can see, we dont need to support any wellformed JSONness in the
> *@annotation* . It just takes a string and we just render it into the output
> json.
> @annotation("foo", "[\"bar\"]") -> "tags": ["bar"]
> @annotation("foo", "\{\"bar\": \"jar\"}") -> "tags": {"bar": "jar"}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)