ta1meng opened a new issue #10426:
URL: https://github.com/apache/pulsar/issues/10426


   **Describe the bug**
   The Avro schema supports default values of null. The syntax is `"default": 
null`. pulsar-admin accepts this syntax, but support for this syntax is lacking 
elsewhere in Pulsar, resulting in IncompatibleSchema exceptions between schemas 
that appear identical.
   
   This ticket asks for improved logging for schema info objects that contain 
the `"default": null` specification.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Using Pulsar 2.7.1,  run `bin/pulsar standalone`
   2. Configure schema compatibility policies on a namespace:
   ```
   bin/pulsar-admin namespaces set-is-allow-auto-update-schema --disable 
climate/field-service
   
   bin/pulsar-admin namespaces set-schema-compatibility-strategy 
climate/field-service --compatibility FORWARD_TRANSITIVE
   
   bin/pulsar-admin namespaces set-schema-validation-enforce --enable 
climate/field-service
   
   bin/pulsar-admin namespaces set-schema-autoupdate-strategy 
climate/field-service --disabled
   ```
   3. Upload the following schemas into a new topic. They differ in only one 
place, the specification of `"default":null`.
   ```
   // ActionV0.schema 
   {
       "type": "AVRO",
       "schema": 
"{\"name\":\"Action\",\"type\":\"record\",\"fields\":[{\"name\":\"action\",\"type\":[\"null\",\"string\"],\"default\":null}]}",
       "properties": {}
   }
   
   // ActionV1.schema 
   {
       "type": "AVRO",
       "schema": 
"{\"name\":\"Action\",\"type\":\"record\",\"fields\":[{\"name\":\"action\",\"type\":[\"null\",\"string\"]}]}",
       "properties": {}
   }
   
   tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin 
schemas upload --filename ~/pulsar/pythonSandbox/schemas/ActionV0.schema 
climate/field-service/actions
   tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin 
schemas upload --filename ~/pulsar/pythonSandbox/schemas/ActionV1.schema 
climate/field-service/actions
   ```
   4. Two schema versions are uploaded because they are compatible. They are 
printed as the same, so it's impossible to see their difference after uploading 
them:
   ```
   tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin 
schemas get climate/field-service/actions --version 0                           
                       
   
   {
     "name": "actions",
     "schema": {
       "name": "Action",
       "type": "record",
       "fields": [
         {
           "name": "action",
           "type": [
             "null",
             "string"
           ]
         }
       ]
     },
     "type": "AVRO",
     "properties": {}
   }
   tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin 
schemas get climate/field-service/actions --version 1
   
   {
     "name": "actions",
     "schema": {
       "name": "Action",
       "type": "record",
       "fields": [
         {
           "name": "action",
           "type": [
             "null",
             "string"
           ]
         }
       ]
     },
     "type": "AVRO",
     "properties": {}
   }
   ```
   5. Using the Python client library, I found no way to produce a message 
using version 0 of the schema. Everything I tried resulted in an 
`IncompatibleSchema` exception.
   ```
   class Action(Record):
       action = String()
   ```
   6. However, the Action class above works with version 1 of the schema, the 
one without `"\default\":null` specified.
   
   **Expected behavior**
   The two schemas are _different_, so they should not be printed as 
_identical_. In this case, the `"default":null` should be printed when calling 
`bin/pulsar-admin schemas get climate/field-service/actions --version 0`. 
   
   Further, there should be a way to construct a Record class using the Python 
client library, so an event can be written to a topic with a schema containing 
`"default":null`.
   
   **Screenshots**
   N/A.
   
   **Desktop (please complete the following information):**
    - OS: MacOS Catalina Version 10.15.17
   
   **Additional context**
   `"default":null` seems like a common default value to specify in Avro 
schemas. The `IncompatibleSchema` exception that it causes complicated efforts 
to triage mistakes and bugs that resulted in `IncompatibleSchema`. Bug tickets 
whose triage was significantly complicated due to the presence of 
`"default:null`: https://github.com/apache/pulsar/issues/9571, 
https://github.com/apache/pulsar/issues/8510.
   
   The overall impact is that Avro schema support seems quite broken in Pulsar. 
There were questions on whether Kafka's Avro schema support is this buggy. If 
we had still been deciding between Kafka and Pulsar, this may have changed our 
decision.
   
   Another solution is to create a new doc page for Pulsar's Avro support. On 
that doc page, known limitations of Pulsar's Avro support should be documented. 
Sample text for this problem (it might not be correct, but it would help anyone 
experimenting with Avro support in Pulsar):
   
   ```
   Pulsar implements support a subset of Avro schemas.
   
   Pulsar does not support `"default":null` for string fields. 
   
   To specify a default value of null for a string field, simply omit that 
clause. 
   
   This is because for string fields without default values, Pulsar consumers 
will default these fields to null and auto-convert null into the empty string 
for consumers. 
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to