Rymar Maksym created DRILL-7812:
-----------------------------------

             Summary: Broken equals/hashcode contract 
                 Key: DRILL-7812
                 URL: https://issues.apache.org/jira/browse/DRILL-7812
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Rymar Maksym
            Assignee: Rymar Maksym


*MaterializedField* class [has broken equals/hashCode 
contract|https://github.com/apache/drill/blob/31d6086c4f814c1d7fc476095611e37cc3d95d1c/exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java#L192]:

{{If two objects are equal according to the equals(Object) method, then calling 
the hashCode method on each of the two objects must produce the same integer 
result.}}

In our case *{{equals()}}* method depends on 2 fields: name and type. While 
*{{hashCode()}}* method depends on 3 fields: name, type and child. This is 
leading to serious bugs. For example, it can occurs in *SortRecordBatchBuilder* 
class 
[there|https://github.com/apache/drill/blob/31d6086c4f814c1d7fc476095611e37cc3d95d1c/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/sort/SortRecordBatchBuilder.java#L142]
 :
{code:java}
if (batches.keySet().size() > 1) {
   throw UserException.validationError(null)
      .message("Sort currently only supports a single schema.")
      .build(logger);
}
{code}
*Batches* is *{{ArrayListMultimap<BatchSchema, RecordBatchData> and}}* when 
*{{RecordBatchData}}* is insert with *{{BatchSchema}}* key – occurs not 
expected behaivor, because *{{RecordBatchData}}* hashCode is based on hashCode 
of MaterializedField:
{code:java}
@Override
public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + ((fields == null) ? 0 : fields.hashCode());
  result = prime * result + ((selectionVectorMode == null) ? 0 : 
selectionVectorMode.hashCode());
  return result;
}{code}
So *{{RecordBatchData}}* with equals *{{BatchSchema}}* are going to be add to 
*{{ArrayListMultimap}}* as different entries. It's not common situation, and 
most easily can be reproduced with json tables.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to