[jira] [Updated] (AVRO-3694) Correlate messages with locations in reader/writer schema compatibility check results

Oleksii Shevtsov (Jira) Wed, 21 Dec 2022 05:47:05 -0800


     [ 
https://issues.apache.org/jira/browse/AVRO-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Oleksii Shevtsov updated AVRO-3694:
-----------------------------------
    Description: 
There is an issue with the class {*}SchemaCompatibilityResult{*}, defined in 
{*}compatibility.py{*}:
{code:java}
class SchemaCompatibilityResult:
    def __init__(
        self,
        compatibility: SchemaCompatibilityType = 
SchemaCompatibilityType.recursion_in_progress,
        incompatibilities: Optional[List[SchemaIncompatibilityType]] = None,
        messages: Optional[Set[str]] = None,
        locations: Optional[Set[str]] = None,
    ):
        self.locations = locations or {"/"}
        self.messages = messages or set()
        self.compatibility = compatibility
        self.incompatibilities = incompatibilities or []{code}
Here, *locations* and *messages* are defined as python sets and therefore are 
unordered. When a compatibility check is made between a reader and a writer 
schema, the check is made recursively, and results of the above type are merged 
together for each incompatibility found. The problem is that locations and 
messages must go in pairs, while they are defined as separate attributes, and 
are currently merged as follows, see {*}compatibility.py{*}:
{code:java}
def merge(this: SchemaCompatibilityResult, that: SchemaCompatibilityResult) -> 
SchemaCompatibilityResult:
    ...
        messages = this.messages.union(that.messages)
        locations = this.locations.union(that.locations)
    ...{code}
Since python sets are not ordered, it is possible to get *messages* that are 
not in sync with their {*}locations{*}.
h2. Proposed solution

Encapsulate *location* and *message* into a simple data class (or named tuple) 
to keep these two pieces of information together.

  was:
There is an issue with the class {*}SchemaCompatibilityResult{*}, defined in 
{*}compatibility.py{*}:
{code:java}
class SchemaCompatibilityResult:
    def __init__(
        self,
        compatibility: SchemaCompatibilityType = 
SchemaCompatibilityType.recursion_in_progress,
        incompatibilities: Optional[List[SchemaIncompatibilityType]] = None,
        messages: Optional[Set[str]] = None,
        locations: Optional[Set[str]] = None,
    ):
        self.locations = locations or {"/"}
        self.messages = messages or set()
        self.compatibility = compatibility
        self.incompatibilities = incompatibilities or []{code}
Here, *locations* and *messages* are defined as python sets and therefore are 
unordered. When a compatibility check is made between a reader and a writer 
schema, the check is made recursively, and results of the above type are merged 
together for each incompatibility found. The problem is that locations and 
messages must go in pairs, while they are defined as separate attributes, and 
are currently merged as follows, see {*}compatibility.py{*}:
{code:java}
def merge(this: SchemaCompatibilityResult, that: SchemaCompatibilityResult) -> 
SchemaCompatibilityResult:
    ...
        messages = this.messages.union(that.messages)
        locations = this.locations.union(that.locations)
    ...{code}
Since python sets are not ordered, it is possible to get *messages* that are 
not in sync with their {*}locations{*}.
h2. Proposed solution

Encapsulate *location* and *message* into a simple data class to keep these two 
pieces of information together.


> Correlate messages with locations in reader/writer schema compatibility check 
> results
> -------------------------------------------------------------------------------------
>
>                 Key: AVRO-3694
>                 URL: https://issues.apache.org/jira/browse/AVRO-3694
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: python
>            Reporter: Oleksii Shevtsov
>            Priority: Major
>
> There is an issue with the class {*}SchemaCompatibilityResult{*}, defined in 
> {*}compatibility.py{*}:
> {code:java}
> class SchemaCompatibilityResult:
>     def __init__(
>         self,
>         compatibility: SchemaCompatibilityType = 
> SchemaCompatibilityType.recursion_in_progress,
>         incompatibilities: Optional[List[SchemaIncompatibilityType]] = None,
>         messages: Optional[Set[str]] = None,
>         locations: Optional[Set[str]] = None,
>     ):
>         self.locations = locations or {"/"}
>         self.messages = messages or set()
>         self.compatibility = compatibility
>         self.incompatibilities = incompatibilities or []{code}
> Here, *locations* and *messages* are defined as python sets and therefore are 
> unordered. When a compatibility check is made between a reader and a writer 
> schema, the check is made recursively, and results of the above type are 
> merged together for each incompatibility found. The problem is that locations 
> and messages must go in pairs, while they are defined as separate attributes, 
> and are currently merged as follows, see {*}compatibility.py{*}:
> {code:java}
> def merge(this: SchemaCompatibilityResult, that: SchemaCompatibilityResult) 
> -> SchemaCompatibilityResult:
>     ...
>         messages = this.messages.union(that.messages)
>         locations = this.locations.union(that.locations)
>     ...{code}
> Since python sets are not ordered, it is possible to get *messages* that are 
> not in sync with their {*}locations{*}.
> h2. Proposed solution
> Encapsulate *location* and *message* into a simple data class (or named 
> tuple) to keep these two pieces of information together.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (AVRO-3694) Correlate messages with locations in reader/writer schema compatibility check results

Reply via email to