Paul Rogers created DRILL-7428:
----------------------------------

             Summary: Drill incorrectly allows a repeated map field to be 
projected to top level
                 Key: DRILL-7428
                 URL: https://issues.apache.org/jira/browse/DRILL-7428
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Paul Rogers


Consider the following query from the [Mongo DB 
tests|https://github.com/apache/drill/blob/master/contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/MongoTestConstants.java#L80]:

{noformat}
select t.name as name, t.topping.type as type 
  from mongo.%s.`%s` t where t.sales >= 150
{noformat}


The query is used in 
[{{TestMongoQueries.testUnShardedDBInShardedClusterWithProjectionAndFilter()}}|https://github.com/apache/drill/blob/master/contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/TestMongoQueries.java#L89].
 
Here it turns out that {{topping}} is a repeated map. The query is projecting 
the members of that map to the top level. The query has five rows, but 24 
values in the repeated map. The Project operator allows the projection, 
resulting in an output batch in which most vectors have 5 values, but the 
{{topping}} column, now at the top level and no longer in the map, has 24 
values.

As a result, the first five values, formerly associated with the first record, 
are now associated with the first five top-level records, while the values 
formerly associated with records 1-4 are lost.

Thus, this is a data corruption bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to