Paul Rogers created DRILL-5826:
----------------------------------

             Summary: UnorderedReceiverBatch fails to detect a schema change
                 Key: DRILL-5826
                 URL: https://issues.apache.org/jira/browse/DRILL-5826
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.11.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


Run the following HBase query using:

{code}
select * from `hbase`.browser_action2 a
{code}

Table is defined as:
{code}
> create 'browser_action2', 'v', {SPLITS => 
> ['0','1','2','3','4','5','6','7','8','9']}
...
> scan 'browser_action2'
ROW                                   COLUMN+CELL                               
                                                                
 1                                    column=v:e0, timestamp=1506560555979, 
value=abc1                                                          
 2                                    column=v:e0, timestamp=1506560564807, 
value=abc2
{code}

Step through the {{UnorderedReceiverBatch}} with a parallelization of 1. 
Observe the following (behavior is random):

* The first batch has schema (row_key, v) where v is an empty map 
(corresponding to a column family), but no data (zero rows.)
* Because the first batch has columns, it is sent downstream with 
{{OK_NEW_SCHEMA}}.
* The second batch has schema (row_key, v{e0}), where v is a map with column e0 
(corresponding to a column family with one column) and one row.
* The code loads the batch, asking the batch itself if it has a new schema.
* The batch does not have a new schema so returns false.
* The {{UnorderedReceiverBatch}} returns {OK}, indicating to the downstream 
operator that the second batch has the same schema as the first (which, in this 
case, turns out to not be true.)

Code in question:

{code}
      final boolean schemaChanged = batchLoader.load(rbd, batch.getBody());
{code}

In point of fact, each sender has no visibility to the schema of other senders, 
and the order of receiving batches is undefined. Therefore, an input batch has 
no way of knowing if it has the same schema as the previous output batch.

The obvious, correct, logic is to compare the incoming batch schema with the 
current receiver schema, and send {{OK}} or {{OK_NEW_SCHEMA}} based on the 
result of that comparison.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to