ta1meng commented on issue #9571:
URL: https://github.com/apache/pulsar/issues/9571#issuecomment-828086647
@codelipenghui @congbobo184 while the original issue was fixed, there were
three bugs that I encountered that also resulted in an `IncompatibleSchema`
exception.
Consider the workarounds that I implemented locally:
```
@classmethod
def schema(cls):
schema = {
'name': str(cls.__name__),
'type': 'record',
'fields': []
}
# Do NOT sort the keys!!
for name in cls._fields.keys():
field = cls._fields[name]
field_type = field.schema() if field._required else (
# HACK for default values
[field.schema(), 'null'] if field._default != None else
['null', field.schema()]
)
schema['fields'].append({
'name': name,
'type': field_type
} if field._default == None else {
# HACK for default values
'name': name,
'type': field_type,
'default': field._default
})
return schema
```
which overrides `Record::schema()` in `definition.py`.
`First issue`: Pulsar's schema comparison logic seems textual in nature, so
if two fields are specified in reverse order, the schema comparison would
return "incompatible", even though the Avro schemas are compatible. The
workaround I put in removes the sorting of field names, so they appear in the
same order in the schema as they are declared in code. This issue can either be
fixed server-side or in the Python client library.
`Second issue`: there is no default value support in the schema generation
code. I hacked it in. I'm new to Python and I'm sure this code can be written
better.
`Third issue:`: `"default": null` is partially supported, and rarely logged.
This one took me the longest to figure out because I saw identical schemas
printed that resulted in an IncompatibleSchema exception. While this issue
would be tricky to fix in the Pulsar client library, we can improve things
greatly by logging `"default": null` when it is a part of the schema. This one
is different from the first two issues, as it seems to require a sweep across
multiple Pulsar projects, so I will file this issue separately.
So this ticket can be used to track `First issue` and `Second issue`, as
they are both localized to a single method in the Pulsar Python client library.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]