[jira] [Comment Edited] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

Daniel Barclay (Drill) (JIRA) Tue, 03 Nov 2015 21:43:57 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983410#comment-14983410
 ]


Daniel Barclay (Drill) edited comment on DRILL-2288 at 11/4/15 5:42 AM:
------------------------------------------------------------------------

Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning 
{{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't 
get its schema, even for static-schema sources, or even get trigger to update 
their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was 
not documented clearly (so developers didn't know correctly what to expect or 
provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of 
{{IterOutcome values}} (so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} 
correctly (so it reported spurious/incorrect schema-change and/or 
empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR 
{"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so 
it didn't reset nested schema-change state, and so caused spurious 
{{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field 
already existed in the batch (so in that case it forcibly changed the type to 
{{NullableIntVector}}, causing schema changes and downstream exceptions). 
\[Note:  DRILL-2288 does not address other problems with {{NullableIntVector}} 
dummy columns from {{JsonRecordReader}}.]

7.  HBase tests used only one table region, ignoring known problems with 
multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left 
undetected and unresolved.)   \[Note: DRILL-2288 addresses only one test table 
(increasing the number of regions on the other test tables exposed at least one 
other problem; others remain).]

8.  {{HBaseRecordReader}} didn't create a {{MapVector}} for every HBase column 
family and every requested HBase column (so {{NullableIntVector}} dummy columns 
got created, causing spurious schema changes and downstream exceptions).

9.  Some {{RecordBatch}} classes didn't reset their record counts to zero 
({{OrderedPartitionRecordBatch.recordCount}}, 
{{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so 
downstream code tried to access elements of (correctly) empty vectors, yielding 
{{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

10.  {{RecordBatchLoader}}'s record count was not reset to zero by 
{{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements 
of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... 
{{range (0, 0)}}") ).

11.  {{MapVector.load(...)}} left some existing vectors empty, not matching the 
returned length and the length of sibling vectors (so 
{{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... 
{{range (0, 0)}}").  \[Note: DRILL-2288 does not address the root problem.]

12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case 
of a zero-record record batch (so when it read a zero-row record batch, it 
caused a memory leak reported at Drillbit shutdown time).

13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited 
identifiers of a form (with a period) that Drill can't handle (so the test 
failed when the test ran with multiple fragments).





was (Author: dsbos):
Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning 
{{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't 
get its schema, even for static-schema sources, or even get trigger to update 
their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was 
not documented clearly (so developers didn't know correctly what to expect or 
provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of 
{{IterOutcome values}} (so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} 
correctly (so it reported spurious/incorrect schema-change and/or 
empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR 
{"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so 
it didn't reset nested schema-change state, and so caused spurious 
{{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field 
already existed in the batch (so in that case it forcibly changed the type to 
{{NullableIntVector}}, causing schema changes and downstream exceptions). 
\[Note:  DRILL-2288 does not address other problems with {{NullableIntVector}} 
dummy columns from {{JsonRecordReader}}.]

7.  HBase tests used only one table region, ignoring known problems with 
multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left 
undetected and unresolved.)   \[Note: DRILL-2288 addresses only one test table 
(increasing the number of regions on the other test tables exposed at least one 
other problem; others remain).]

8.  {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family 
(so {{NullableIntVector}} dummy columns got created, causing spurious schema 
changes and downstream exceptions).

9.  Some {{RecordBatch}} classes didn't reset their record counts to zero 
({{OrderedPartitionRecordBatch.recordCount}}, 
{{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so 
downstream code tried to access elements of (correctly) empty vectors, yielding 
{{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

10.  {{RecordBatchLoader}}'s record count was not reset to zero by 
{{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements 
of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... 
{{range (0, 0)}}") ).

11.  {{MapVector.load(...)}} left some existing vectors empty, not matching the 
returned length and the length of sibling vectors (so 
{{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... 
{{range (0, 0)}}").  \[Note: DRILL-2288 does not address the root problem.]

12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case 
of a zero-record record batch (so when it read a zero-row record batch, it 
caused a memory leak reported at Drillbit shutdown time).

13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited 
identifiers of a form (with a period) that Drill can't handle (so the test 
failed when the test ran with multiple fragments).




> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2288
>                 URL: https://issues.apache.org/jira/browse/DRILL-2288
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Information Schema
>            Reporter: Daniel Barclay (Drill)
>            Assignee: Daniel Barclay (Drill)
>             Fix For: 1.3.0
>
>         Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

Reply via email to