[
https://issues.apache.org/jira/browse/AVRO-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278281#comment-17278281
]
Ryan Skraba commented on AVRO-3026:
-----------------------------------
I agree that the spec [isn't super
clear|https://avro.apache.org/docs/current/idl.html#minutiae_annotations]
(implying that only known annotations are accepted), but it says all
annotations before the {{string}} are copied to the field schema, and all of
the annotations before {{fullName}} are copied to the field.
Using [~opwvhk]'s example and technique:
{code}
record Employee {
@tags(["one"]) string @tags(["two"]) fullName;
}
{code}
would result in the following schema (in JSON):
{code}
{
"type" : "record",
"name" : "Employee",
"fields" : [ {
"name" : "fullName",
"type" : {
"type" : "string",
"tags" : [ "one" ]
},
"tags" : [ "two" ]
} ]
}
{code}
It appears that the annotation before the field name is what you're looking for!
> Allow custom annotations in IDL files and support translating them to AVSC
> Avro.
> --------------------------------------------------------------------------------
>
> Key: AVRO-3026
> URL: https://issues.apache.org/jira/browse/AVRO-3026
> Project: Apache Avro
> Issue Type: New Feature
> Components: spec
> Affects Versions: 1.9.0, 1.9.1, 1.9.2, 1.10.1
> Reporter: Feroze Daud
> Priority: Major
>
> h2. Introduction
> Our company has standardized on Avro schemas for all Data intestion and
> storage. As part of this, and to satisfy CCPA, we need to be able to tag the
> records and fields appropriately if they have PI, or Non PI information, etc.
> Avro AVSC files, being valid json, can easily be modified to add tags that
> will be used by downstream processors, and also wont interfere with Avro
> itself ( to generate POJO, serialization, deserialization etc)
> One such key we chose is simply called *tags*. It's example usage is shown
> below.
> {code:java}
> {
> "type": "record",
> "name": "PropertyOwner",
> "namespace": "com.acme.Property",
> "tags": ["PI", "PII" ],
> "fields": [
> {
> "name": "FullName",
> "type": "string",
> "tags": ["Name"]
> },
> {
> "name": "PhoneNumber",
> "type": "string",
> "tags": ["Phone"]
> }],
> }{code}
>
> These tags can be processed by downstream processors and the data landing in
> datalake, or database can be tagged appropriately.
>
> h2. Problem Description
> While tagging will work fine for AVSC because adding extra fields doesnt make
> it invalid, we will have a problem when using IDL to author schemas. IDL spec
> does not allow a way to add extra tags that are copied over to the Avro
> schema.
>
> h2. Proposal
> I propose that we allow a special *@annotation* tag . And, this tag can be
> applied to records and fields. Whatever is in this annotation should be
> copied verbatim to the output AVSC.
> For eg:
> {code:java}
> @annotation("tags", "[\"PI\", \"Non PI\"]"
> record Employee {
> @annotation("tags", "[\"Name\"]"
> string fullName;
> boolean active = true;
> long salary;
> @annotation("tags", "[\"Phone\"]"
> string phone;
> } {code}
>
> would generate an avro schema as folllows:
>
> {code:java}
> {
> "type": "record",
> "name": "Employee",
> "tags": ["PI", "PII" ],
> "fields": [
> {
> "name": "FullName",
> "type": "string",
> "tags": ["Name"]
> },
> {
> "name": "PhoneNumber",
> "type": "string",
> "tags": ["Phone"]
> }],
> }{code}
>
> As you can see, we dont need to support any wellformed JSONness in the
> *@annotation* . It just takes a string and we just render it into the output
> json.
> @annotation("foo", "[\"bar\"]") -> "tags": ["bar"]
> @annotation("foo", "\{\"bar\": \"jar\"}") -> "tags": {"bar": "jar"}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)