[
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987956#comment-14987956
]
ASF GitHub Bot commented on DRILL-2288:
---------------------------------------
Github user dsbos commented on a diff in the pull request:
https://github.com/apache/drill/pull/228#discussion_r43796603
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
---
@@ -325,10 +325,13 @@ public AggOutcome doWork() {
if (EXTRA_DEBUG_1) {
logger.debug("Received new schema. Batch has {}
records.", incoming.getRecordCount());
}
- newSchema = true;
- this.cleanup();
- // TODO: new schema case needs to be handled appropriately
- return AggOutcome.UPDATE_AGGREGATOR;
+ final BatchSchema newIncomingSchema = incoming.getSchema();
+ if ((! newIncomingSchema.equals(schema)) && schema !=
null) {
--- End diff --
> You shouldn't put a hack in to ignore a valid schema change.
Yeah, I'm reverting this HashAggTemplate change.
> I don't know all the details but if this is related to HBase, my thought
is you probably need
> to resolve 4010 to avoid spurious schema changes.
It's not clear whether it's related to HBase or not. (The relevant test
failure did occur in an HBase test, but it was in a Jenkins-run cluster test
that has been hard to reproduce elsewhere. At one point (I thought) I had a
reproduction using just JSON files, but now that doesn't seem to work.)
> You should make sure to avoid propagating schema changes that are not
real/required.
Related to that, ExternalSortBatch seems to return two OK_NEW_SCHEMA
batches with schemas differing only in their selection vectors:
11:23:50.149 [main] TRACE o.a.d.e.p.i.v.IteratorValidatorBatchIterator -
[#3; on ExternalSortBatch]: incoming next() return: #records = 204,
schema:
BatchSchema [fields=[`key1`(INT:REQUIRED), `alt`(BIGINT:REQUIRED)],
selectionVector=FOUR_BYTE],
prev. new (not equal):
BatchSchema [fields=[`key1`(INT:REQUIRED), `alt`(BIGINT:REQUIRED)],
selectionVector=NONE]
(The earlier schema ("prev. new:") seems to come from setting up the sort,
and the latter one ("schema:") seems to come from completing the sort (it was
triggered by a NONE from upstream).)
Is that correct? (Should the selection vectors have been the same? Should
the second return have been OK instead of OK_NEW_SCHEMA?)
If that is correct (e.g., vectors were changed and should have been), how
should downstream operators, or at least aggregation, handle that?
Should they be able to handle a "technical" schema change (OK_NEW_SCHEMA
with possibly-changed vectors or certain details of the schema) that is not a
logical schema change (not a change in the logical fields or types in the
aggregated value), even if arbitrary logical schema changes can't be handled
(e.g., because some don't even make sense)?
> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing
> JDBC metadata (schema) for 0-row results...]
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Information Schema
> Reporter: Daniel Barclay (Drill)
> Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up
> (getColumnCount() returns zero, and trying to access any other metadata
> throws IndexOutOfBoundsException) for a result set with zero rows, at least
> for one from DatabaseMetaData.getColumns(...).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)