[ 
https://issues.apache.org/jira/browse/ASTERIXDB-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Maxon updated ASTERIXDB-3533:
---------------------------------
    Labels: triaged  (was: )

> Incorrect behaviour of passthrough nested SELECT expression
> -----------------------------------------------------------
>
>                 Key: ASTERIXDB-3533
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3533
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: *DB - AsterixDB, COMP - Compiler
>    Affects Versions: 0.9.10
>            Reporter: Abhishek Jindal
>            Priority: Major
>              Labels: triaged
>
> h2. *Steps to reproduce:*
> Create a datatype `orderType` as follows:
> {code:java}
> CREATE DATAVERSE Commerce IF NOT EXISTS;
> USE Commerce;
> CREATE TYPE orderType IF NOT EXISTS AS {
>     orderno:                  int,
>     custid:                   string,
>     ship_date:                string?,
>     order_date:          string,
>     items:                [ itemType ]
> };
> CREATE DATASET orders(orderType) IF NOT EXISTS
>     PRIMARY KEY orderno;
> {code}
> Upsert a document to the `orders` dataset:
> {code:java}
> UPSERT INTO orders2([
>     {
>         "orderno": 1001,
>         "custid": "C41",
>         "order_date": "2020-04-29",
>         "ship_date": "2020-05-03",
>         "randomkey" : "randomval",
>         "items": [
>             {
>                 "itemno": 347,
>                 "qty": 5,
>                 "price": 19.99
>             },
>             {
>                 "itemno": 193,
>                 "qty": 2,
>                 "price": 28.89
>             }
>         ]
>     }]);
> {code}
> Write the following query, having a select expression in the project phase as 
> follows:
> {code:java}
> select
>     (select element item from o.items as item) as result
> from orders as o where orderno = 1001;
> {code}
> This yields correct results:
> {code:java}
> [
>   {
>     "itemno": 347,
>     "qty": 5,
>     "price": 19
>   },
>   {
>     "itemno": 193,
>     "qty": 2,
>     "price": 28
>   }
> ]
> {code}
> h2. *Bug*
> However, change the scan source from `o.items` to `o.randomkey` as follows:
> {code:java}
> select
>     (select element item from o.randomkey as item) as result
> from orders as o where orderno = 1001;
> {code}
> h2. *Expected behaviour:*
> Since value of o.randomkey is not a list or unordered list, the 
> scan-collection operator should fail and return null value. Since there is 
> nothing to add here, the listify operator should return an empty array 
> Expected response:
> {code}
> {"result": []}
> {code}
> h2. *Actual behaviour:*
> {code}
> {
> result: "randomvalue"
> }
> {code}
> ----
> Observation:
> We see here that the plan before optimisation is as follows where the subplan 
> has scan-collection operator followed by listify:
> {code}
> distribute result [$$34] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- 
> |UNPARTITIONED|
>   project ([$$34]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- 
> |UNPARTITIONED|
>     assign [$$34] <- [{"result": $$33}] [cardinality: 0.0, op-cost: 0.0, 
> total-cost: 0.0] -- |UNPARTITIONED|
>       subplan {
>                 aggregate [$$33] <- [listify($$item)] [cardinality: 0.0, 
> op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
>                   unnest $$item <- scan-collection($$o.getField("randomkey")) 
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
>                     nested tuple source [cardinality: 0.0, op-cost: 0.0, 
> total-cost: 0.0] -- |UNPARTITIONED|
>              } [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- 
> |UNPARTITIONED|
>         select (eq($$o.getField("orderno"), 1005)) [cardinality: 0.0, 
> op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
>           unnest $$o <- dataset("Default", "Commerce", "orders2", false) 
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
>             empty-tuple-source [cardinality: 0.0, op-cost: 0.0, total-cost: 
> 0.0] -- |UNPARTITIONED|
> {code}
> Plan after optimisation:
> The subplan is empty:
> {code}
> distribute result [$$34] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     assign [$$34] <- [{"result": $$37}] project: [$$34] [cardinality: 0.0, 
> op-cost: 0.0, total-cost: 0.0]
>     -- ASSIGN  |PARTITIONED|
>       subplan {
>                 nested tuple source [cardinality: 0.0, op-cost: 0.0, 
> total-cost: 0.0]
>                 -- NESTED_TUPLE_SOURCE  |LOCAL|
>              } [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>       -- SUBPLAN  |PARTITIONED|
>         assign [$$37] <- [$$o.getField("randomkey")] project: [$$37] 
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>         -- ASSIGN  |PARTITIONED|
>           project ([$$o]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>           -- STREAM_PROJECT  |PARTITIONED|
>             exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>               unnest-map [$$35, $$o] <- index-search("orders2", 0, "Default", 
> "Commerce", "orders2", false, false, 1, $$38, 1, $$39, true, true, true) 
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>               -- BTREE_SEARCH  |PARTITIONED|
>                 exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
>                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                   assign [$$38, $$39] <- [1005, 1005] [cardinality: 0.0, 
> op-cost: 0.0, total-cost: 0.0]
>                   -- ASSIGN  |PARTITIONED|
>                     empty-tuple-source [cardinality: 0.0, op-cost: 0.0, 
> total-cost: 0.0]
>                     -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
> {code}
> ---
> Worth noting that this disappearance in operators from subplan happen ONLY if 
> the subquery projects the items from source without any transformations.
> For subqueries that do perform transformations, such as :
> {code}
> select
>     (select element item.x from o.randomkey as item ) as result
> from orders2 as o where orderno = 1005;
> {code}
> The optimiser does not make any changes in the subplan:
> {code}
> subplan {
>                   aggregate [$$35] <- [listify($$34)] [cardinality: 0.0, 
> op-cost: 0.0, total-cost: 0.0]
>                   -- AGGREGATE  |LOCAL|
>                     assign [$$34] <- [$$item.getField("x")] [cardinality: 
> 0.0, op-cost: 0.0, total-cost: 0.0]
>                     -- ASSIGN  |LOCAL|
>                       unnest $$item <- scan-collection($$39) [cardinality: 
> 0.0, op-cost: 0.0, total-cost: 0.0]
>                       -- UNNEST  |LOCAL|
>                         nested tuple source [cardinality: 0.0, op-cost: 0.0, 
> total-cost: 0.0]
>                         -- NESTED_TUPLE_SOURCE  |LOCAL|
>                }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to