Daniel Barclay (Drill) created DRILL-3659:
---------------------------------------------
Summary: UnionAllRecordBatch infers wrongly from next()
IterOutcome values
Key: DRILL-3659
URL: https://issues.apache.org/jira/browse/DRILL-3659
Project: Apache Drill
Issue Type: Bug
Reporter: Daniel Barclay (Drill)
When UnionAllRecordBatch uses IterOutcome values returned from the next()
method of upstream batches, it seems to be using those values wrongly (making
incorrect inferences about what they mean).
In particular, some switch statements seem to check for NONE vs. OK_NEW_SCHEMA
in order to determine whether there are any rows (instead of explicitly
checking the number of rows). However, OK_NEW_SCHEMA can be returned even when
there are zero rows.
The apparent latent bug in the union code blocks the fix for DRILL-2288 (having
ScanBatch return OK_NEW_SCHEMA for a zero-rows case in which is was wrongly
(per the IterOutcome protocol) returning NONE without first returning
OK_NEW_SCHEMA).
For details of IterOutcome values, see the Javadoc documentation of
RecordBatch.IterOutcome (after DRILL-3641 is merged; until then, see
https://github.com/apache/drill/pull/113).
For an environment/code state that exposes the UnionAllRecordBatch problems,
see https://github.com/dsbos/incubator-drill/tree/bugs/WORK_2288_etc, which
includes:
- a test that exposes the DRILL-2288 problem;
- an enhanced IteratorValidatorBatchIterator, which now detects IterOutcome
value sequence violations; and
- a fixed (though not-yet-cleaned) version of ScanBatch that fixes the
DRILL-2288 problem and thereby exposes the UnionAllRecordBatch problem (several
test methods in each of TestUnionAll and TestUnionDistinct fail).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)