Oleksii Shevtsov created AVRO-3694:
--------------------------------------
Summary: Correlate messages with locations in reader/writer schema
compatibility check results
Key: AVRO-3694
URL: https://issues.apache.org/jira/browse/AVRO-3694
Project: Apache Avro
Issue Type: Improvement
Components: python
Reporter: Oleksii Shevtsov
There is an issue with the class {*}SchemaCompatibilityResult{*}, defined in
{*}compatibility.py{*}:
{code:java}
class SchemaCompatibilityResult:
def __init__(
self,
compatibility: SchemaCompatibilityType =
SchemaCompatibilityType.recursion_in_progress,
incompatibilities: Optional[List[SchemaIncompatibilityType]] = None,
messages: Optional[Set[str]] = None,
locations: Optional[Set[str]] = None,
):
self.locations = locations or {"/"}
self.messages = messages or set()
self.compatibility = compatibility
self.incompatibilities = incompatibilities or []{code}
Here, *locations* and *messages* are defined as python sets and therefore are
unordered. When a compatibility check is made between a reader and a writer
schema, the check is made recursively, and results of the above type are merged
together for each incompatibility found. The problem is that locations and
messages must go in pairs, while they are defined as separate attributes, and
are currently merged as follows, see {*}compatibility.py{*}:
{code:java}
def merge(this: SchemaCompatibilityResult, that: SchemaCompatibilityResult) ->
SchemaCompatibilityResult:
...
messages = this.messages.union(that.messages)
locations = this.locations.union(that.locations)
...{code}
Since python sets are not ordered, it is possible to get *messages* that are
not in sync with their {*}locations{*}.
h2. Proposed solution
Encapsulate `location` and `message` into a simple data class to keep these two
pieces of information together.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)