Rymar Maksym created DRILL-7812:
-----------------------------------
Summary: Broken equals/hashcode contract
Key: DRILL-7812
URL: https://issues.apache.org/jira/browse/DRILL-7812
Project: Apache Drill
Issue Type: Bug
Reporter: Rymar Maksym
Assignee: Rymar Maksym
*MaterializedField* class [has broken equals/hashCode
contract|https://github.com/apache/drill/blob/31d6086c4f814c1d7fc476095611e37cc3d95d1c/exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java#L192]:
{{If two objects are equal according to the equals(Object) method, then calling
the hashCode method on each of the two objects must produce the same integer
result.}}
In our case *{{equals()}}* method depends on 2 fields: name and type. While
*{{hashCode()}}* method depends on 3 fields: name, type and child. This is
leading to serious bugs. For example, it can occurs in *SortRecordBatchBuilder*
class
[there|https://github.com/apache/drill/blob/31d6086c4f814c1d7fc476095611e37cc3d95d1c/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/sort/SortRecordBatchBuilder.java#L142]
:
{code:java}
if (batches.keySet().size() > 1) {
throw UserException.validationError(null)
.message("Sort currently only supports a single schema.")
.build(logger);
}
{code}
*Batches* is *{{ArrayListMultimap<BatchSchema, RecordBatchData> and}}* when
*{{RecordBatchData}}* is insert with *{{BatchSchema}}* key – occurs not
expected behaivor, because *{{RecordBatchData}}* hashCode is based on hashCode
of MaterializedField:
{code:java}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((fields == null) ? 0 : fields.hashCode());
result = prime * result + ((selectionVectorMode == null) ? 0 :
selectionVectorMode.hashCode());
return result;
}{code}
So *{{RecordBatchData}}* with equals *{{BatchSchema}}* are going to be add to
*{{ArrayListMultimap}}* as different entries. It's not common situation, and
most easily can be reproduced with json tables.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)