I'm picking up the languishing ticket AVRO-1938. One issue with the PR
(https://github.com/apache/avro/pull/143) is that it strips the
default value from a field:
input:
'{"name":"example","type":"record","fields":[{"name":"def","type":"bytes","default":"abc"}]}'
output:
'{"name":"example","type":"record","fields":[{"name":"def","type":"bytes"}]}'
The specification
(https://avro.apache.org/docs/1.9.1/spec.html#Parsing+Canonical+Form+for+Schemas)
doesn't address record fields at all. If we are to assume that the
same rules apply to fields as to other schema parts, then the [STRIP]
rule says we should drop "default" and "order" from fields. But that
can't be right -- default is crucial for readers, and two schema
differing only on by a default are certainly different schema and
ought to have different fingerprints.
Did I miss something in reading the spec, or is this a gap? How should
I interpret the spec in implementing parsing canonical form for record
fields. Specifically, should the canonical form of a record field
preserve its order and default values in the [STRIP] rule, and if so,
where do those things go in the [ORDER] rule?
Thanks,
Michael A. Smith