[ 
https://issues.apache.org/jira/browse/DRILL-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186618#comment-16186618
 ] 

Paul Rogers commented on DRILL-5828:
------------------------------------

The fix for DRILL-5826 will provide a unit test that demonstrates this issue:

{code}
    final RecordBatchLoader batchLoader = new RecordBatchLoader(allocator);
    BatchSchema schema1 = new SchemaBuilder()
        .add("a", MinorType.INT)
        .add("b", MinorType.VARCHAR)
        .build();
    {
      // Prime the loader with the above schema
      assertTrue(loadBatch(allocator, batchLoader, schema1));
      assertTrue(schema1.isEquivalent(batchLoader.getSchema()));
      batchLoader.getContainer().zeroVectors();
    }
    {
      // Next batch has a permutated schema
      BatchSchema schema = new SchemaBuilder()
          .add("b", MinorType.VARCHAR)
          .add("a", MinorType.INT)
          .build();
      // Load the batch. Returns false, indicating no schema change
      assertFalse(loadBatch(allocator, batchLoader, schema));
      // But, the actual schema has changed (order): it matches the second 
schema
      assertTrue(schema.isEquivalent(batchLoader.getSchema()));
      batchLoader.getContainer().zeroVectors();
    }
{code}


> RecordBatchLoader permutes column order
> ---------------------------------------
>
>                 Key: DRILL-5828
>                 URL: https://issues.apache.org/jira/browse/DRILL-5828
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> The {{RecordBatchLoader}} class deserializes batches and checks for schema 
> changes. As part of investigating DRILL-5826, it seems that 
> {{RecordBatchLoader}} detects schema changes as follows:
> * If two batches have the same column in the same order, no schema change 
> occurs. (Fine)
> * If batch A has schema (a, b) while batch B has (b, a), then no schema 
> change occurs. (Fine)
> But, in the case of permutated columns (second case above), the 
> {{RecordBatchLoader}} returns the column order of the second batch, though it 
> says that no schema change has occurred.
> That is, {{RecordBatchLoader}} says that the schema has not changed, but the 
> actual schema has changed (column order changed.)
> This is a potential problem: if a downstream batch counts on the same column 
> order, then that assumption is violated by the behavior described above.
> Correct behavior would be to coerce the second batch to match the schema of 
> the first batch, if the {{RecordBatchLoader}} indicates that no schema change 
> occurred.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to