[jira] [Commented] (DRILL-5830) Resolve regressions to MapR DB from DRILL-5546

ASF GitHub Bot (JIRA) Wed, 11 Oct 2017 12:41:34 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200837#comment-16200837
 ]


ASF GitHub Bot commented on DRILL-5830:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/968#discussion_r144114556
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/record/vector/TestLoad.java 
---
    @@ -119,7 +122,283 @@ public void testLoadValueVector() throws Exception {
           }
         }
         assertEquals(100, recordCount);
    +
    +    // Free the original vectors
    +
    +    writableBatch.clear();
    +
    +    // Free the deserialized vectors
    +
         batchLoader.clear();
    +
    +    // The allocator will verify that the frees were done correctly.
    +
    +    allocator.close();
    +  }
    +
    +  // TODO: Replace this low-level code with RowSet usage once
    +  // DRILL-5657 is committed to master.
    +
    +  private static List<ValueVector> createVectors(BufferAllocator 
allocator, BatchSchema schema, int i) {
    +    final List<ValueVector> vectors = new ArrayList<>();
    +    for (MaterializedField field : schema) {
    +      @SuppressWarnings("resource")
    +      ValueVector v = TypeHelper.getNewVector(field, allocator);
    +      AllocationHelper.allocate(v, 100, 50);
    +      v.getMutator().generateTestData(100);
    +      vectors.add(v);
    +    }
    +    return vectors;
    +  }
    +
    +  private static DrillBuf serializeBatch(BufferAllocator allocator, 
WritableBatch writableBatch) {
    +    final ByteBuf[] byteBufs = writableBatch.getBuffers();
    +    int bytes = 0;
    +    for (ByteBuf buf : byteBufs) {
    +      bytes += buf.writerIndex();
    +    }
    +    final DrillBuf byteBuf = allocator.buffer(bytes);
    +    int index = 0;
    +    for (ByteBuf buf : byteBufs) {
    +      buf.readBytes(byteBuf, index, buf.writerIndex());
    +      index += buf.writerIndex();
    +    }
    +    byteBuf.writerIndex(bytes);
    +    return byteBuf;
    +  }
    +
    +  @SuppressWarnings("resource")
    +  private boolean loadBatch(BufferAllocator allocator,
    +      final RecordBatchLoader batchLoader,
    +      BatchSchema schema) throws SchemaChangeException {
    +    final List<ValueVector> vectors = createVectors(allocator, schema, 
100);
    +    final WritableBatch writableBatch = WritableBatch.getBatchNoHV(100, 
vectors, false);
    +    final DrillBuf byteBuf = serializeBatch(allocator, writableBatch);
    +    boolean result = batchLoader.load(writableBatch.getDef(), byteBuf);
    +    byteBuf.release();
         writableBatch.clear();
    +    return result;
    +  }
    +
    +  @Test
    +  public void testSchemaChange() throws SchemaChangeException {
    +    final BufferAllocator allocator = 
RootAllocatorFactory.newRoot(drillConfig);
    +    final RecordBatchLoader batchLoader = new RecordBatchLoader(allocator);
    +
    +    // Initial schema: a: INT, b: VARCHAR
    +    // Schema change: N/A
    +
    +    BatchSchema schema1 = new SchemaBuilder()
    +        .add("a", MinorType.INT)
    +        .add("b", MinorType.VARCHAR)
    +        .build();
    +    {
    +      assertTrue(loadBatch(allocator, batchLoader, schema1));
    +      assertTrue(schema1.isEquivalent(batchLoader.getSchema()));
    +      batchLoader.getContainer().zeroVectors();
    +    }
    +
    +    // Same schema
    +    // Schema change: No
    +
    +    {
    +      assertFalse(loadBatch(allocator, batchLoader, schema1));
    +      assertTrue(schema1.isEquivalent(batchLoader.getSchema()));
    +      batchLoader.getContainer().zeroVectors();
    +    }
    +
    +    // Reverse columns: b: VARCHAR, a: INT
    +    // Schema change: ?
    +
    +    {
    +      BatchSchema schema = new SchemaBuilder()
    +          .add("b", MinorType.VARCHAR)
    +          .add("a", MinorType.INT)
    +          .build();
    +      assertFalse(loadBatch(allocator, batchLoader, schema));
    +
    +      // Potential bug: see DRILL-5828
    +
    +      assertTrue(schema.isEquivalent(batchLoader.getSchema()));
    +      batchLoader.getContainer().zeroVectors();
    +    }
    +
    +    // Drop a column: a: INT
    +    // Schema change: ?
    --- End diff --
    
    Fixed.


> Resolve regressions to MapR DB from DRILL-5546
> ----------------------------------------------
>
>                 Key: DRILL-5830
>                 URL: https://issues.apache.org/jira/browse/DRILL-5830
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> DRILL-5546 added a number of fixes for empty batches. One part of the fix was 
> for HBase. Key changes:
> * Add code to expand wildcards in the planner. (i.e. SELECT *)
> * Remove support for wildcards in the HBase record reader.
> As noted in DRILL-5775, this change had the effect of breaking support for 
> MapR-DB binary (which is API compatible with HBase.) DRILL-5775 does this by 
> expanding wildcards in the planner for MapR DB as was done for HBase in 
> DRILL-5546.
> Unfortunately, this change introduced other regressions into the code as 
> described by DRILL-5706.
> Investigation of those issues revealed that we should back out the original 
> DRILL-5546 changes and go down a different route.
> As it turns out, HBase already had a project push-down rule that expanded 
> wildcards. However, that rule didn't work correctly some of the time. 
> DRILL-5546 fixed that bug, ensuring that wildcards are expanded (at least in 
> the cases tested for this ticket.)
> The actual issue turned out to be a bug in the {{RecordBatchLoader}} class 
> which did not consider map contents when detecting schema change. As a 
> result, results like (row_key, cf\{}) were treated the same as (row_key, 
> cf\{mycol}) and the actual data colums were discarded, but randomly depending 
> on batch arrival order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5830) Resolve regressions to MapR DB from DRILL-5546

Reply via email to