ta1meng opened a new issue #10426:
URL: https://github.com/apache/pulsar/issues/10426
**Describe the bug**
The Avro schema supports default values of null. The syntax is `"default":
null`. pulsar-admin accepts this syntax, but support for this syntax is lacking
elsewhere in Pulsar, resulting in IncompatibleSchema exceptions between schemas
that appear identical.
This ticket asks for improved logging for schema info objects that contain
the `"default": null` specification.
**To Reproduce**
Steps to reproduce the behavior:
1. Using Pulsar 2.7.1, run `bin/pulsar standalone`
2. Configure schema compatibility policies on a namespace:
```
bin/pulsar-admin namespaces set-is-allow-auto-update-schema --disable
climate/field-service
bin/pulsar-admin namespaces set-schema-compatibility-strategy
climate/field-service --compatibility FORWARD_TRANSITIVE
bin/pulsar-admin namespaces set-schema-validation-enforce --enable
climate/field-service
bin/pulsar-admin namespaces set-schema-autoupdate-strategy
climate/field-service --disabled
```
3. Upload the following schemas into a new topic. They differ in only one
place, the specification of `"default":null`.
```
// ActionV0.schema
{
"type": "AVRO",
"schema":
"{\"name\":\"Action\",\"type\":\"record\",\"fields\":[{\"name\":\"action\",\"type\":[\"null\",\"string\"],\"default\":null}]}",
"properties": {}
}
// ActionV1.schema
{
"type": "AVRO",
"schema":
"{\"name\":\"Action\",\"type\":\"record\",\"fields\":[{\"name\":\"action\",\"type\":[\"null\",\"string\"]}]}",
"properties": {}
}
tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin
schemas upload --filename ~/pulsar/pythonSandbox/schemas/ActionV0.schema
climate/field-service/actions
tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin
schemas upload --filename ~/pulsar/pythonSandbox/schemas/ActionV1.schema
climate/field-service/actions
```
4. Two schema versions are uploaded because they are compatible. They are
printed as the same, so it's impossible to see their difference after uploading
them:
```
tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin
schemas get climate/field-service/actions --version 0
{
"name": "actions",
"schema": {
"name": "Action",
"type": "record",
"fields": [
{
"name": "action",
"type": [
"null",
"string"
]
}
]
},
"type": "AVRO",
"properties": {}
}
tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin
schemas get climate/field-service/actions --version 1
{
"name": "actions",
"schema": {
"name": "Action",
"type": "record",
"fields": [
{
"name": "action",
"type": [
"null",
"string"
]
}
]
},
"type": "AVRO",
"properties": {}
}
```
5. Using the Python client library, I found no way to produce a message
using version 0 of the schema. Everything I tried resulted in an
`IncompatibleSchema` exception.
```
class Action(Record):
action = String()
```
6. However, the Action class above works with version 1 of the schema, the
one without `"\default\":null` specified.
**Expected behavior**
The two schemas are _different_, so they should not be printed as
_identical_. In this case, the `"default":null` should be printed when calling
`bin/pulsar-admin schemas get climate/field-service/actions --version 0`.
Further, there should be a way to construct a Record class using the Python
client library, so an event can be written to a topic with a schema containing
`"default":null`.
**Screenshots**
N/A.
**Desktop (please complete the following information):**
- OS: MacOS Catalina Version 10.15.17
**Additional context**
`"default":null` seems like a common default value to specify in Avro
schemas. The `IncompatibleSchema` exception that it causes complicated efforts
to triage mistakes and bugs that resulted in `IncompatibleSchema`. Bug tickets
whose triage was significantly complicated due to the presence of
`"default:null`: https://github.com/apache/pulsar/issues/9571,
https://github.com/apache/pulsar/issues/8510.
The overall impact is that Avro schema support seems quite broken in Pulsar.
There were questions on whether Kafka's Avro schema support is this buggy. If
we had still been deciding between Kafka and Pulsar, this may have changed our
decision.
Another solution is to create a new doc page for Pulsar's Avro support. On
that doc page, known limitations of Pulsar's Avro support should be documented.
Sample text for this problem (it might not be correct, but it would help anyone
experimenting with Avro support in Pulsar):
```
Pulsar implements support a subset of Avro schemas.
Pulsar does not support `"default":null` for string fields.
To specify a default value of null for a string field, simply omit that
clause.
This is because for string fields without default values, Pulsar consumers
will default these fields to null and auto-convert null into the empty string
for consumers.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]