Paul Rogers created DRILL-5828:
----------------------------------

             Summary: RecordBatchLoader permutes column order
                 Key: DRILL-5828
                 URL: https://issues.apache.org/jira/browse/DRILL-5828
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.11.0
            Reporter: Paul Rogers
            Priority: Minor


The {{RecordBatchLoader}} class deserializes batches and checks for schema 
changes. As part of investigating DRILL-5826, it seems that 
{{RecordBatchLoader}} detects schema changes as follows:

* If two batches have the same column in the same order, no schema change 
occurs. (Fine)
* If batch A has schema (a, b) while batch B has (b, a), then no schema change 
occurs. (Fine)

But, in the case of permutated columns (second case above), the 
{{RecordBatchLoader}} returns the column order of the second batch, though it 
says that no schema change has occurred.

That is, {{RecordBatchLoader}} says that the schema has not changed, but the 
actual schema has changed (column order changed.)

This is a potential problem: if a downstream batch counts on the same column 
order, then that assumption is violated by the behavior described above.

Correct behavior would be to coerce the second batch to match the schema of the 
first batch, if the {{RecordBatchLoader}} indicates that no schema change 
occurred.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to