[ 
https://issues.apache.org/jira/browse/DRILL-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030337#comment-16030337
 ] 

Jinfeng Ni commented on DRILL-5525:
-----------------------------------

At least one issue has been opened for BatchSchema/MaterializedField.equals() 
methods in the past : we did not include the nested fields in the comparison.  
That means we may not be able to detect schema change if the batch contains 
nested fields. 

https://issues.apache.org/jira/browse/DRILL-3487


> Inconsistent, unhelpful semantics for batch, field schema comparison
> --------------------------------------------------------------------
>
>                 Key: DRILL-5525
>                 URL: https://issues.apache.org/jira/browse/DRILL-5525
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.11.0
>
>
> Drill provides two classes to define schema at execution time:
> * {{MaterializedField}} - describes one column (or a collection of columns 
> for a map)
> * {{BatchSchema}} - describes the set of columns that make up a row
> Each class provides an {{isEqual()}} method. However it seems these methods 
> may not be used much because the contain a number of flaws and contradictions.
> * {{MaterializedField}} has a comment that says it compares only names; but 
> then it turns around and compares the object field-by-field using Protobuf:
> {code}
>     // DRILL-1872: Compute equals only on key. See also the comment
>     // in MapVector$MapTransferPair
>     return this.name.equalsIgnoreCase(other.name) &&
>             Objects.equals(this.type, other.type);
> {code}
> * {{MaterializedField}} ends up doing a physical comparison of fields rather 
> than a logical comparison. Varchar columns are defined, at the logical level 
> as Varchar. But, at the physical level, a Varchar is a two-part column: it 
> has a child that represents the {{$offsets$}} vector. As a result, a 
> comparison between logical and physical schema returns false, even if both is 
> just a Varchar.
> * {{BatchSchema}} ends up being inconsistent because of the above confusion. 
> It first (seemingly) compares names, then tries to compare types. But, 
> because the {{MaterializedField}} method actually compares all fields, the 
> type comparison does nothing.
> {code}
>     if (!fields.equals(other.fields)) { // Compare all fields
>       return false;
>     }
>     for (int i = 0; i < fields.size(); i++) { // So this loop is a no-op
>       MajorType t1 = fields.get(i).getType();
>       MajorType t2 = other.fields.get(i).getType();
>       if (t1 == null) {
>         if (t2 != null) {
>           return false;
>         }
>       } else {
>         if (!majorTypeEqual(t1, t2)) {
>           return false;
>         }
>       }
>     }
> {code}
> The result is that one is not quite sure what the two methods are supposed to 
> compare. Is {{MaterializedField}} supposed to:
> * Do a physical compare (existing behavior)?
> * Do a name-only compare (as the comment suggests)?
> * Do a logical comparison (as unit tests would want)?
> And for {{BatchSchema}}, should it:
> * Do a physical compare (existing behavior)?
> * Do a type-aware comparison (as in the loop that does nothing)?
> Further note that neither of the methods do anything special for map fields: 
> they end up being compared as part of the protobuf comparison. But, we shold 
> probably apply the same rules to map fields as we apply to top-level fields 
> (as expressed in the second loop above.)
> This ticket requests:
> * Document current uses.
> * Determine the semantics of the {{isEqual()}} method.
> * Create additional methods as needed: {{isPhysicallyEqual()}}, 
> {{isLogicallyEqual()}}, {{hasSameNames()}}, or whatever.
> Not that this issue is classic: it seems that "equal" is well defined 
> concept, but as the Greek philosopher 
> [Heraclitus|https://en.wikipedia.org/wiki/Heraclitus] suggested with his 
> famous quote that "you can never step into the same river twice", the concept 
> of "sameness" is fluid and ad-hoc. It is up to us to define the semantics for 
> equality, and sometimes we need more than one concept.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to