[ 
https://issues.apache.org/jira/browse/DRILL-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-7325:
-------------------------------
    Description: 
See DRILL-7324. The following are problems found because some operators fail to 
set the record count for their containers.

h4. Scan

TestComplexTypeReader, on cluster setup, using the PojoRecordReader:

ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from ScanBatch
ScanBatch: Container record count not set

Reason: ScanBatch never sets the record count of its container (this is a 
generic issue, not specific to the PojoRecordReader).

h4. Filter

{{TestComplexTypeReader.testNonExistentFieldConverting()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from FilterRecordBatch
FilterRecordBatch: Container record count not set
{noformat}

h4. Hash Join

{{TestComplexTypeReader.test_array()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from HashJoinBatch
HashJoinBatch: Container record count not set
{noformat}

Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} with 
no records.

h4. Project

TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, 
schema-only batches):

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from ProjectRecordBatch
ProjectRecordBatch: Container record count not set
{noformat}

Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but 
does not set the value count to 0.

h4. Unordered Receiver

{{TestCsvWithSchema.testMultiFileSchema()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from UnorderedReceiverBatch
UnorderedReceiverBatch: Container record count not set
{noformat}

The problem is that {{RecordBatchLoader.load()}} does not set the container 
record count.

h4. Streaming Aggregate

{{TestJsonReader.testSumWithTypeCase()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from StreamingAggBatch
StreamingAggBatch: Container record count not set
{noformat}

The problem is that {{StreamingAggBatch.buildSchema()}} does not set the 
container record count to 0.

h4. Limit

{{TestJsonReader.testDrill_1419()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from LimitRecordBatch
LimitRecordBatch: Container record count not set
{noformat}

None of the paths in {{LimitRecordBatch.innerNext()}} set the container record 
count.

h4. Union All

{{TestJsonReader.testKvgenWithUnionAll()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from UnionAllRecordBatch
UnionAllRecordBatch: Container record count not set
{noformat}

When {{UnionAllRecordBatch}} calls 
{{VectorAccessibleUtilities.setValueCount()}}, it did not also set the 
container count.

h4. Hash Aggregate

{{TestJsonReader.drill_4479()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from HashAggBatch
HashAggBatch: Container record count not set
{noformat}

Problem is that {{HashAggBatch.buildSchema()}} does not set the container 
record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}}

h4. And Many More

I turns out that most operators fail to set one of the many row count variables 
somewhere in their code path: maybe in the schema setup path, maybe when 
building a batch along one of the many paths that operators follow. Further, we 
have multiple row counts that must be set:

* Values in each vector ({{setValueCount()}},
* Row count in the container ({{setRecordCount()}}), which must be the same as 
the vector value count.
* Row count in the operator (batch), which is the (possibly filtered) count of 
records presented to downstream operators. It must be less than or equal to the 
container row count (except for an SV4.)
* The SV2 record count, which is the number of entries in the SV2 and must be 
the same as the batch row count (and less or equal to the container row count.)
* The SV2 actual bactch record count, which must be the same as the container 
row count.
* The SV4 record count, which must be the same as the batch record count. With 
an SV4, the batch consists of multiple containers, each of which must have an 
accurate container record count.


  was:
See DRILL-7324. The following are problems found because some operators fail to 
set the record count for their containers.

h4. Scan

TestComplexTypeReader, on cluster setup, using the PojoRecordReader:

ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from ScanBatch
ScanBatch: Container record count not set

Reason: ScanBatch never sets the record count of its container (this is a 
generic issue, not specific to the PojoRecordReader).

h4. Filter

{{TestComplexTypeReader.testNonExistentFieldConverting()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from FilterRecordBatch
FilterRecordBatch: Container record count not set
{noformat}

h4. Hash Join

{{TestComplexTypeReader.test_array()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from HashJoinBatch
HashJoinBatch: Container record count not set
{noformat}

Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} with 
no records.

h4. Project

TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, 
schema-only batches):

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from ProjectRecordBatch
ProjectRecordBatch: Container record count not set
{noformat}

Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but 
does not set the value count to 0.

h4. Unordered Receiver

{{TestCsvWithSchema.testMultiFileSchema()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from UnorderedReceiverBatch
UnorderedReceiverBatch: Container record count not set
{noformat}

The problem is that {{RecordBatchLoader.load()}} does not set the container 
record count.

h4. Streaming Aggregate

{{TestJsonReader.testSumWithTypeCase()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from StreamingAggBatch
StreamingAggBatch: Container record count not set
{noformat}

The problem is that {{StreamingAggBatch.buildSchema()}} does not set the 
container record count to 0.

h4. Limit

{{TestJsonReader.testDrill_1419()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from LimitRecordBatch
LimitRecordBatch: Container record count not set
{noformat}

None of the paths in {{LimitRecordBatch.innerNext()}} set the container record 
count.

h4. Union All

{{TestJsonReader.testKvgenWithUnionAll()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from UnionAllRecordBatch
UnionAllRecordBatch: Container record count not set
{noformat}

When {{UnionAllRecordBatch}} calls 
{{VectorAccessibleUtilities.setValueCount()}}, it did not also set the 
container count.

h4. Hash Aggregate

{{TestJsonReader.drill_4479()}}:

{noformat}
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
from HashAggBatch
HashAggBatch: Container record count not set
{noformat}

Problem is that {{HashAggBatch.buildSchema()}} does not set the container 
record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}}


> Scan, Project, Hash Join do not set container record count
> ----------------------------------------------------------
>
>                 Key: DRILL-7325
>                 URL: https://issues.apache.org/jira/browse/DRILL-7325
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.17.0
>
>
> See DRILL-7324. The following are problems found because some operators fail 
> to set the record count for their containers.
> h4. Scan
> TestComplexTypeReader, on cluster setup, using the PojoRecordReader:
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ScanBatch
> ScanBatch: Container record count not set
> Reason: ScanBatch never sets the record count of its container (this is a 
> generic issue, not specific to the PojoRecordReader).
> h4. Filter
> {{TestComplexTypeReader.testNonExistentFieldConverting()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from FilterRecordBatch
> FilterRecordBatch: Container record count not set
> {noformat}
> h4. Hash Join
> {{TestComplexTypeReader.test_array()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashJoinBatch
> HashJoinBatch: Container record count not set
> {noformat}
> Occurs on the first batch in which the hash join returns {{OK_NEW_SCHEMA}} 
> with no records.
> h4. Project
> TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, 
> schema-only batches):
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from ProjectRecordBatch
> ProjectRecordBatch: Container record count not set
> {noformat}
> Occurs in {{ProjectRecordBatch.handleNullInput()}}: it sets up the schema but 
> does not set the value count to 0.
> h4. Unordered Receiver
> {{TestCsvWithSchema.testMultiFileSchema()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnorderedReceiverBatch
> UnorderedReceiverBatch: Container record count not set
> {noformat}
> The problem is that {{RecordBatchLoader.load()}} does not set the container 
> record count.
> h4. Streaming Aggregate
> {{TestJsonReader.testSumWithTypeCase()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from StreamingAggBatch
> StreamingAggBatch: Container record count not set
> {noformat}
> The problem is that {{StreamingAggBatch.buildSchema()}} does not set the 
> container record count to 0.
> h4. Limit
> {{TestJsonReader.testDrill_1419()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from LimitRecordBatch
> LimitRecordBatch: Container record count not set
> {noformat}
> None of the paths in {{LimitRecordBatch.innerNext()}} set the container 
> record count.
> h4. Union All
> {{TestJsonReader.testKvgenWithUnionAll()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from UnionAllRecordBatch
> UnionAllRecordBatch: Container record count not set
> {noformat}
> When {{UnionAllRecordBatch}} calls 
> {{VectorAccessibleUtilities.setValueCount()}}, it did not also set the 
> container count.
> h4. Hash Aggregate
> {{TestJsonReader.drill_4479()}}:
> {noformat}
> ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors 
> from HashAggBatch
> HashAggBatch: Container record count not set
> {noformat}
> Problem is that {{HashAggBatch.buildSchema()}} does not set the container 
> record count to 0 for the first, empty, batch sent for {{OK_NEW_SCHEMA.}}
> h4. And Many More
> I turns out that most operators fail to set one of the many row count 
> variables somewhere in their code path: maybe in the schema setup path, maybe 
> when building a batch along one of the many paths that operators follow. 
> Further, we have multiple row counts that must be set:
> * Values in each vector ({{setValueCount()}},
> * Row count in the container ({{setRecordCount()}}), which must be the same 
> as the vector value count.
> * Row count in the operator (batch), which is the (possibly filtered) count 
> of records presented to downstream operators. It must be less than or equal 
> to the container row count (except for an SV4.)
> * The SV2 record count, which is the number of entries in the SV2 and must be 
> the same as the batch row count (and less or equal to the container row 
> count.)
> * The SV2 actual bactch record count, which must be the same as the container 
> row count.
> * The SV4 record count, which must be the same as the batch record count. 
> With an SV4, the batch consists of multiple containers, each of which must 
> have an accurate container record count.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to