[
https://issues.apache.org/jira/browse/DRILL-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186403#comment-16186403
]
Vitalii Diravka commented on DRILL-5826:
----------------------------------------
[~Paul.Rogers]
I have the same observations.
Can we skip this first empty batch like others empty bathes ["skip over empty
batches"|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java#L161]?
Looks like the following change can resolve the issue:
{code}
// skip over empty batches. we do this since these are basically control
messages.
while (batch != null && batch.getHeader().getDef().getRecordCount() ==
0) {
batch = getNextBatch();
}
{code}
> UnorderedReceiverBatch fails to detect a schema change within a map
> -------------------------------------------------------------------
>
> Key: DRILL-5826
> URL: https://issues.apache.org/jira/browse/DRILL-5826
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.11.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
>
> Run the following HBase query using:
> {code}
> select * from `hbase`.browser_action2 a
> {code}
> Table is defined as:
> {code}
> > create 'browser_action2', 'v', {SPLITS =>
> > ['0','1','2','3','4','5','6','7','8','9']}
> ...
> > scan 'browser_action2'
> ROW COLUMN+CELL
>
> 1 column=v:e0, timestamp=1506560555979,
> value=abc1
> 2 column=v:e0, timestamp=1506560564807,
> value=abc2
> {code}
> Step through the {{UnorderedReceiverBatch}} with a parallelization of 1.
> Observe the following (behavior is random):
> * The first batch has schema (row_key, v) where v is an empty map
> (corresponding to a column family), but no data (zero rows.)
> * Because the first batch has columns, it is sent downstream with
> {{OK_NEW_SCHEMA}}.
> * The second batch has schema (row_key, v{e0}), where v is a map with column
> e0 (corresponding to a column family with one column) and one row.
> * The code loads the batch, asking the batch itself if it has a new schema.
> * The batch does not have a new schema so returns false.
> * The {{UnorderedReceiverBatch}} returns {OK}, indicating to the downstream
> operator that the second batch has the same schema as the first (which, in
> this case, turns out to not be true.)
> Code in question:
> {code}
> final boolean schemaChanged = batchLoader.load(rbd, batch.getBody());
> {code}
> In point of fact, each sender has no visibility to the schema of other
> senders, and the order of receiving batches is undefined. Therefore, an input
> batch has no way of knowing if it has the same schema as the previous output
> batch.
> The obvious, correct, logic is to compare the incoming batch schema with the
> current receiver schema, and send {{OK}} or {{OK_NEW_SCHEMA}} based on the
> result of that comparison.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)