[ 
https://issues.apache.org/jira/browse/AVRO-2299?focusedWorklogId=752090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752090
 ]

ASF GitHub Bot logged work on AVRO-2299:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Apr/22 03:04
            Start Date: 04/Apr/22 03:04
    Worklog Time Spent: 10m 
      Work Description: twmb commented on code in PR #805:
URL: https://github.com/apache/avro/pull/805#discussion_r841335930


##########
doc/src/content/xdocs/spec.xml:
##########
@@ -1310,6 +1310,92 @@
         </ul>
       </section>
 
+      <section>
+        <title>Standard Canonical Form for Schemas</title>
+
+        <p>One of defined way to normalize the avro schema using
+          <em>Standard Canonical Form Transformation</em>. This involves
+          stripping unwanted properties and maintain same canonical
+          ordering. The canonical ordering involves ordering avro
+          reserved properties followed by custom properties if mentioned while
+          transforming. Normalization schema which helps to reduce the
+          total memory size of schema (removed unwanted properties and 
whitespace)
+          while transfer avro schema between two system and also reduce the 
parsing
+          time for compatibility check and schema evolution.
+        </p>
+
+        <p><em>Standard Canonical Form</em> is a transformation of a schema
+          into standard canonical ordered. It contains only avro reserved
+          properties <code>"name", "type", "fields", "symbols", "items", 
"values",
+            "logicalType", "size", "order", "doc", "aliases", "default"</code>
+          and <em>other (custom properties)</em> schema properties.
+        </p>
+
+        <section>
+          <title>Transforming into Standard Canonical Form</title>
+
+          <p>Assuming an input schema (in JSON form) that's already
+            UTF-8 text for a <em>valid</em> Avro schema (including all
+            quotes as required by JSON), the following transformations
+            will produce its Standard Canonical Form:</p>
+          <ul>
+            <li> [PRIMITIVES] Convert primitive schemas to their simple
+              form (e.g., <code>int</code> instead of
+              <code>{"type":"int"}</code>).</li>
+
+            <li> [FULLNAMES] Replace short names with fullnames, using
+              applicable namespaces to do so.  Then eliminate
+              <code>namespace</code> attributes, which are now redundant.</li>
+
+            <li> [STRIP] Keep only attributes that are relevant to
+              reserved properties, which are:
+              <code>type</code>, <code>name</code>,

Review Comment:
   Size is only relevant for `fixed`, it should not be present in any other 
type.
   





Issue Time Tracking
-------------------

    Worklog Id:     (was: 752090)
    Time Spent: 20m  (was: 10m)

> Get Plain Schema
> ----------------
>
>                 Key: AVRO-2299
>                 URL: https://issues.apache.org/jira/browse/AVRO-2299
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.9.0, 1.8.2, 1.9.1
>            Reporter: Rumeshkrishnan Mohan
>            Assignee: Doug Cutting
>            Priority: Major
>              Labels: features, pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> {panel:title=Avro Schema Reserved Keys:}
> "doc", "fields", "items", "name", "namespace",
>  "size", "symbols", "values", "type", "aliases", "default"
> {panel}
> AVRO also supports user defined properties for both Schema and Field.
> Is there way to get the schema with reserved property (key, value)? 
> Input Schema: 
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
>     {
>       "name": "email",
>       "type": "string",
>       "doc": "email id",
>       "user_field_prop": "xxxxx"
>     }
>   ],
>   "user_schema_prop": "xxxxxx"
> }{code}
> Expected Plain Schema:
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
>     {
>       "name": "email",
>       "type": "string",
>       "doc": "email id"
>     }
>   ]
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to