[
https://issues.apache.org/jira/browse/AVRO-2299?focusedWorklogId=752090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752090
]
ASF GitHub Bot logged work on AVRO-2299:
----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Apr/22 03:04
Start Date: 04/Apr/22 03:04
Worklog Time Spent: 10m
Work Description: twmb commented on code in PR #805:
URL: https://github.com/apache/avro/pull/805#discussion_r841335930
##########
doc/src/content/xdocs/spec.xml:
##########
@@ -1310,6 +1310,92 @@
</ul>
</section>
+ <section>
+ <title>Standard Canonical Form for Schemas</title>
+
+ <p>One of defined way to normalize the avro schema using
+ <em>Standard Canonical Form Transformation</em>. This involves
+ stripping unwanted properties and maintain same canonical
+ ordering. The canonical ordering involves ordering avro
+ reserved properties followed by custom properties if mentioned while
+ transforming. Normalization schema which helps to reduce the
+ total memory size of schema (removed unwanted properties and
whitespace)
+ while transfer avro schema between two system and also reduce the
parsing
+ time for compatibility check and schema evolution.
+ </p>
+
+ <p><em>Standard Canonical Form</em> is a transformation of a schema
+ into standard canonical ordered. It contains only avro reserved
+ properties <code>"name", "type", "fields", "symbols", "items",
"values",
+ "logicalType", "size", "order", "doc", "aliases", "default"</code>
+ and <em>other (custom properties)</em> schema properties.
+ </p>
+
+ <section>
+ <title>Transforming into Standard Canonical Form</title>
+
+ <p>Assuming an input schema (in JSON form) that's already
+ UTF-8 text for a <em>valid</em> Avro schema (including all
+ quotes as required by JSON), the following transformations
+ will produce its Standard Canonical Form:</p>
+ <ul>
+ <li> [PRIMITIVES] Convert primitive schemas to their simple
+ form (e.g., <code>int</code> instead of
+ <code>{"type":"int"}</code>).</li>
+
+ <li> [FULLNAMES] Replace short names with fullnames, using
+ applicable namespaces to do so. Then eliminate
+ <code>namespace</code> attributes, which are now redundant.</li>
+
+ <li> [STRIP] Keep only attributes that are relevant to
+ reserved properties, which are:
+ <code>type</code>, <code>name</code>,
Review Comment:
Size is only relevant for `fixed`, it should not be present in any other
type.
Issue Time Tracking
-------------------
Worklog Id: (was: 752090)
Time Spent: 20m (was: 10m)
> Get Plain Schema
> ----------------
>
> Key: AVRO-2299
> URL: https://issues.apache.org/jira/browse/AVRO-2299
> Project: Apache Avro
> Issue Type: New Feature
> Components: java
> Affects Versions: 1.9.0, 1.8.2, 1.9.1
> Reporter: Rumeshkrishnan Mohan
> Assignee: Doug Cutting
> Priority: Major
> Labels: features, pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> {panel:title=Avro Schema Reserved Keys:}
> "doc", "fields", "items", "name", "namespace",
> "size", "symbols", "values", "type", "aliases", "default"
> {panel}
> AVRO also supports user defined properties for both Schema and Field.
> Is there way to get the schema with reserved property (key, value)?
> Input Schema:
> {code:java}
> {
> "name": "testSchema",
> "namespace": "com.avro",
> "type": "record",
> "fields": [
> {
> "name": "email",
> "type": "string",
> "doc": "email id",
> "user_field_prop": "xxxxx"
> }
> ],
> "user_schema_prop": "xxxxxx"
> }{code}
> Expected Plain Schema:
> {code:java}
> {
> "name": "testSchema",
> "namespace": "com.avro",
> "type": "record",
> "fields": [
> {
> "name": "email",
> "type": "string",
> "doc": "email id"
> }
> ]
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)