[ https://issues.apache.org/jira/browse/DRILL-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030337#comment-16030337 ]
Jinfeng Ni commented on DRILL-5525: ----------------------------------- At least one issue has been opened for BatchSchema/MaterializedField.equals() methods in the past : we did not include the nested fields in the comparison. That means we may not be able to detect schema change if the batch contains nested fields. https://issues.apache.org/jira/browse/DRILL-3487 > Inconsistent, unhelpful semantics for batch, field schema comparison > -------------------------------------------------------------------- > > Key: DRILL-5525 > URL: https://issues.apache.org/jira/browse/DRILL-5525 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.10.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > Fix For: 1.11.0 > > > Drill provides two classes to define schema at execution time: > * {{MaterializedField}} - describes one column (or a collection of columns > for a map) > * {{BatchSchema}} - describes the set of columns that make up a row > Each class provides an {{isEqual()}} method. However it seems these methods > may not be used much because the contain a number of flaws and contradictions. > * {{MaterializedField}} has a comment that says it compares only names; but > then it turns around and compares the object field-by-field using Protobuf: > {code} > // DRILL-1872: Compute equals only on key. See also the comment > // in MapVector$MapTransferPair > return this.name.equalsIgnoreCase(other.name) && > Objects.equals(this.type, other.type); > {code} > * {{MaterializedField}} ends up doing a physical comparison of fields rather > than a logical comparison. Varchar columns are defined, at the logical level > as Varchar. But, at the physical level, a Varchar is a two-part column: it > has a child that represents the {{$offsets$}} vector. As a result, a > comparison between logical and physical schema returns false, even if both is > just a Varchar. > * {{BatchSchema}} ends up being inconsistent because of the above confusion. > It first (seemingly) compares names, then tries to compare types. But, > because the {{MaterializedField}} method actually compares all fields, the > type comparison does nothing. > {code} > if (!fields.equals(other.fields)) { // Compare all fields > return false; > } > for (int i = 0; i < fields.size(); i++) { // So this loop is a no-op > MajorType t1 = fields.get(i).getType(); > MajorType t2 = other.fields.get(i).getType(); > if (t1 == null) { > if (t2 != null) { > return false; > } > } else { > if (!majorTypeEqual(t1, t2)) { > return false; > } > } > } > {code} > The result is that one is not quite sure what the two methods are supposed to > compare. Is {{MaterializedField}} supposed to: > * Do a physical compare (existing behavior)? > * Do a name-only compare (as the comment suggests)? > * Do a logical comparison (as unit tests would want)? > And for {{BatchSchema}}, should it: > * Do a physical compare (existing behavior)? > * Do a type-aware comparison (as in the loop that does nothing)? > Further note that neither of the methods do anything special for map fields: > they end up being compared as part of the protobuf comparison. But, we shold > probably apply the same rules to map fields as we apply to top-level fields > (as expressed in the second loop above.) > This ticket requests: > * Document current uses. > * Determine the semantics of the {{isEqual()}} method. > * Create additional methods as needed: {{isPhysicallyEqual()}}, > {{isLogicallyEqual()}}, {{hasSameNames()}}, or whatever. > Not that this issue is classic: it seems that "equal" is well defined > concept, but as the Greek philosopher > [Heraclitus|https://en.wikipedia.org/wiki/Heraclitus] suggested with his > famous quote that "you can never step into the same river twice", the concept > of "sameness" is fluid and ad-hoc. It is up to us to define the semantics for > equality, and sometimes we need more than one concept. -- This message was sent by Atlassian JIRA (v6.3.15#6346)