[
https://issues.apache.org/jira/browse/DRILL-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198754#comment-16198754
]
ASF GitHub Bot commented on DRILL-5830:
---------------------------------------
Github user arina-ielchiieva commented on a diff in the pull request:
https://github.com/apache/drill/pull/968#discussion_r143748870
--- Diff:
exec/java-exec/src/test/java/org/apache/drill/exec/record/vector/TestLoad.java
---
@@ -119,7 +122,283 @@ public void testLoadValueVector() throws Exception {
}
}
assertEquals(100, recordCount);
+
+ // Free the original vectors
+
+ writableBatch.clear();
+
+ // Free the deserialized vectors
+
batchLoader.clear();
+
+ // The allocator will verify that the frees were done correctly.
+
+ allocator.close();
+ }
+
+ // TODO: Replace this low-level code with RowSet usage once
+ // DRILL-5657 is committed to master.
+
+ private static List<ValueVector> createVectors(BufferAllocator
allocator, BatchSchema schema, int i) {
+ final List<ValueVector> vectors = new ArrayList<>();
+ for (MaterializedField field : schema) {
+ @SuppressWarnings("resource")
+ ValueVector v = TypeHelper.getNewVector(field, allocator);
+ AllocationHelper.allocate(v, 100, 50);
+ v.getMutator().generateTestData(100);
+ vectors.add(v);
+ }
+ return vectors;
+ }
+
+ private static DrillBuf serializeBatch(BufferAllocator allocator,
WritableBatch writableBatch) {
+ final ByteBuf[] byteBufs = writableBatch.getBuffers();
+ int bytes = 0;
+ for (ByteBuf buf : byteBufs) {
+ bytes += buf.writerIndex();
+ }
+ final DrillBuf byteBuf = allocator.buffer(bytes);
+ int index = 0;
+ for (ByteBuf buf : byteBufs) {
+ buf.readBytes(byteBuf, index, buf.writerIndex());
+ index += buf.writerIndex();
+ }
+ byteBuf.writerIndex(bytes);
+ return byteBuf;
+ }
+
+ @SuppressWarnings("resource")
+ private boolean loadBatch(BufferAllocator allocator,
+ final RecordBatchLoader batchLoader,
+ BatchSchema schema) throws SchemaChangeException {
+ final List<ValueVector> vectors = createVectors(allocator, schema,
100);
+ final WritableBatch writableBatch = WritableBatch.getBatchNoHV(100,
vectors, false);
+ final DrillBuf byteBuf = serializeBatch(allocator, writableBatch);
+ boolean result = batchLoader.load(writableBatch.getDef(), byteBuf);
+ byteBuf.release();
writableBatch.clear();
+ return result;
+ }
+
+ @Test
+ public void testSchemaChange() throws SchemaChangeException {
+ final BufferAllocator allocator =
RootAllocatorFactory.newRoot(drillConfig);
+ final RecordBatchLoader batchLoader = new RecordBatchLoader(allocator);
+
+ // Initial schema: a: INT, b: VARCHAR
+ // Schema change: N/A
+
+ BatchSchema schema1 = new SchemaBuilder()
+ .add("a", MinorType.INT)
+ .add("b", MinorType.VARCHAR)
+ .build();
+ {
+ assertTrue(loadBatch(allocator, batchLoader, schema1));
+ assertTrue(schema1.isEquivalent(batchLoader.getSchema()));
+ batchLoader.getContainer().zeroVectors();
+ }
+
+ // Same schema
+ // Schema change: No
+
+ {
+ assertFalse(loadBatch(allocator, batchLoader, schema1));
+ assertTrue(schema1.isEquivalent(batchLoader.getSchema()));
+ batchLoader.getContainer().zeroVectors();
+ }
+
+ // Reverse columns: b: VARCHAR, a: INT
+ // Schema change: ?
+
+ {
+ BatchSchema schema = new SchemaBuilder()
+ .add("b", MinorType.VARCHAR)
+ .add("a", MinorType.INT)
+ .build();
+ assertFalse(loadBatch(allocator, batchLoader, schema));
+
+ // Potential bug: see DRILL-5828
+
+ assertTrue(schema.isEquivalent(batchLoader.getSchema()));
+ batchLoader.getContainer().zeroVectors();
+ }
+
+ // Drop a column: a: INT
+ // Schema change: ?
--- End diff --
If we drop column ,looks like we think schema change has occurred, right?
Should we replace to `Schema change: Yes`?
> Resolve regressions to MapR DB from DRILL-5546
> ----------------------------------------------
>
> Key: DRILL-5830
> URL: https://issues.apache.org/jira/browse/DRILL-5830
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.12.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> DRILL-5546 added a number of fixes for empty batches. One part of the fix was
> for HBase. Key changes:
> * Add code to expand wildcards in the planner. (i.e. SELECT *)
> * Remove support for wildcards in the HBase record reader.
> As noted in DRILL-5775, this change had the effect of breaking support for
> MapR-DB binary (which is API compatible with HBase.) DRILL-5775 does this by
> expanding wildcards in the planner for MapR DB as was done for HBase in
> DRILL-5546.
> Unfortunately, this change introduced other regressions into the code as
> described by DRILL-5706.
> Investigation of those issues revealed that we should back out the original
> DRILL-5546 changes and go down a different route.
> As it turns out, HBase already had a project push-down rule that expanded
> wildcards. However, that rule didn't work correctly some of the time.
> DRILL-5546 fixed that bug, ensuring that wildcards are expanded (at least in
> the cases tested for this ticket.)
> The actual issue turned out to be a bug in the {{RecordBatchLoader}} class
> which did not consider map contents when detecting schema change. As a
> result, results like (row_key, cf\{}) were treated the same as (row_key,
> cf\{mycol}) and the actual data colums were discarded, but randomly depending
> on batch arrival order.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)