Thanks for bringing this up Michael, At my current project, I've bumped into this one as well. We're trying to build a schema registry and take the fingerprint from the canonical schema in order to check if something changed. The issue here is that the canonical schema takes the minimal schema that ensures binary compatibility. The default values are only considered when reading the file and do not impact the binary. This was the original idea that I could derive from https://issues.apache.org/jira/browse/AVRO-2299
For me, the first step would be to clarify the Avro spec on the different schemas, such as plain, canonical, etc. So we are sure that every implementation has the same notion of a canonical schema. There is a PR that addresses this issue in the Java implementation and updates the spec: https://github.com/apache/avro/pull/452 But it looks like something is off with the PR. Cheers, Fokko Op zo 3 nov. 2019 om 22:01 schreef Michael A. Smith <[email protected]>: > I'm picking up the languishing ticket AVRO-1938. One issue with the PR > (https://github.com/apache/avro/pull/143) is that it strips the > default value from a field: > > input: > '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes","default":"abc"}]}' > output: > '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes"}]}' > > The specification > ( > https://avro.apache.org/docs/1.9.1/spec.html#Parsing+Canonical+Form+for+Schemas > ) > doesn't address record fields at all. If we are to assume that the > same rules apply to fields as to other schema parts, then the [STRIP] > rule says we should drop "default" and "order" from fields. But that > can't be right -- default is crucial for readers, and two schema > differing only on by a default are certainly different schema and > ought to have different fingerprints. > > Did I miss something in reading the spec, or is this a gap? How should > I interpret the spec in implementing parsing canonical form for record > fields. Specifically, should the canonical form of a record field > preserve its order and default values in the [STRIP] rule, and if so, > where do those things go in the [ORDER] rule? > > Thanks, > Michael A. Smith >
