[
https://issues.apache.org/jira/browse/DRILL-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186618#comment-16186618
]
Paul Rogers commented on DRILL-5828:
------------------------------------
The fix for DRILL-5826 will provide a unit test that demonstrates this issue:
{code}
final RecordBatchLoader batchLoader = new RecordBatchLoader(allocator);
BatchSchema schema1 = new SchemaBuilder()
.add("a", MinorType.INT)
.add("b", MinorType.VARCHAR)
.build();
{
// Prime the loader with the above schema
assertTrue(loadBatch(allocator, batchLoader, schema1));
assertTrue(schema1.isEquivalent(batchLoader.getSchema()));
batchLoader.getContainer().zeroVectors();
}
{
// Next batch has a permutated schema
BatchSchema schema = new SchemaBuilder()
.add("b", MinorType.VARCHAR)
.add("a", MinorType.INT)
.build();
// Load the batch. Returns false, indicating no schema change
assertFalse(loadBatch(allocator, batchLoader, schema));
// But, the actual schema has changed (order): it matches the second
schema
assertTrue(schema.isEquivalent(batchLoader.getSchema()));
batchLoader.getContainer().zeroVectors();
}
{code}
> RecordBatchLoader permutes column order
> ---------------------------------------
>
> Key: DRILL-5828
> URL: https://issues.apache.org/jira/browse/DRILL-5828
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.11.0
> Reporter: Paul Rogers
> Priority: Minor
>
> The {{RecordBatchLoader}} class deserializes batches and checks for schema
> changes. As part of investigating DRILL-5826, it seems that
> {{RecordBatchLoader}} detects schema changes as follows:
> * If two batches have the same column in the same order, no schema change
> occurs. (Fine)
> * If batch A has schema (a, b) while batch B has (b, a), then no schema
> change occurs. (Fine)
> But, in the case of permutated columns (second case above), the
> {{RecordBatchLoader}} returns the column order of the second batch, though it
> says that no schema change has occurred.
> That is, {{RecordBatchLoader}} says that the schema has not changed, but the
> actual schema has changed (column order changed.)
> This is a potential problem: if a downstream batch counts on the same column
> order, then that assumption is violated by the behavior described above.
> Correct behavior would be to coerce the second batch to match the schema of
> the first batch, if the {{RecordBatchLoader}} indicates that no schema change
> occurred.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)