Hi Oleksii, Would you like to send a Pull Request with the suggeted improvement ? Thank you in advance!
Martin On Wed, Dec 21, 2022 at 11:27 AM Oleksii Shevtsov < [email protected]> wrote: > Good morning, > > It seems like this email is the simplest way to signal about issues in > *https://github.com/apache/avro/tree/master/lang/py > <https://github.com/apache/avro/tree/master/lang/py>* > > There is an issue with the class *SchemaCompatibilityResult*, defined in > *compatibility.py*: > > class SchemaCompatibilityResult: > def __init__( > self, > compatibility: SchemaCompatibilityType = > SchemaCompatibilityType.recursion_in_progress, > incompatibilities: Optional[List[SchemaIncompatibilityType]] = > None, > messages: Optional[Set[str]] = None, > locations: Optional[Set[str]] = None, > ): > self.locations = locations or {"/"} > self.messages = messages or set() > self.compatibility = compatibility > self.incompatibilities = incompatibilities or [] > > As you can see the two attributes, *locations* and *messages*, are defined > as python sets and therefore are unordered. When a compatibility check is > made between a reader and a writer schema, the check is made recursively, > and results of the above type are merged together for each incompatibility > found. The problem is that locations and messages must go in pairs, while > they are defined as separate attributes, and merged as follows, see > *compatibility.py:98*: > > def merge(this: SchemaCompatibilityResult, that: SchemaCompatibilityResult) > -> SchemaCompatibilityResult: > ... > messages = this.messages.union(that.messages) > locations = this.locations.union(that.locations) > ... > > Since python sets are not ordered, it is possible to get *messages* that > are not in sync with their *locations*. > > Using python lists instead of sets would solve this problem, but IMHO a > better solution is to encapsulate location and message in a simple class, > so they are always bound together. > > Best wishes, > Oleksii >
