Paul Rogers created DRILL-5828:
----------------------------------
Summary: RecordBatchLoader permutes column order
Key: DRILL-5828
URL: https://issues.apache.org/jira/browse/DRILL-5828
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Paul Rogers
Priority: Minor
The {{RecordBatchLoader}} class deserializes batches and checks for schema
changes. As part of investigating DRILL-5826, it seems that
{{RecordBatchLoader}} detects schema changes as follows:
* If two batches have the same column in the same order, no schema change
occurs. (Fine)
* If batch A has schema (a, b) while batch B has (b, a), then no schema change
occurs. (Fine)
But, in the case of permutated columns (second case above), the
{{RecordBatchLoader}} returns the column order of the second batch, though it
says that no schema change has occurred.
That is, {{RecordBatchLoader}} says that the schema has not changed, but the
actual schema has changed (column order changed.)
This is a potential problem: if a downstream batch counts on the same column
order, then that assumption is violated by the behavior described above.
Correct behavior would be to coerce the second batch to match the schema of the
first batch, if the {{RecordBatchLoader}} indicates that no schema change
occurred.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)