Abhishek Jindal created ASTERIXDB-3533:
------------------------------------------

             Summary: Incorrect behaviour of passthrough nested SELECT 
expression
                 Key: ASTERIXDB-3533
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3533
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: *DB - AsterixDB, COMP - Compiler
    Affects Versions: 0.9.10
            Reporter: Abhishek Jindal


h2. *Steps to reproduce:*

Create a datatype `orderType` as follows:
{code:java}
CREATE DATAVERSE Commerce IF NOT EXISTS;
USE Commerce;
CREATE TYPE orderType IF NOT EXISTS AS {
    orderno:                    int,
    custid:                     string,
    ship_date:          string?,
    order_date:          string,
    items:                  [ itemType ]
};
CREATE DATASET orders(orderType) IF NOT EXISTS
    PRIMARY KEY orderno;
{code}
Upsert a document to the `orders` dataset:
{code:java}
UPSERT INTO orders2([
    {
        "orderno": 1001,
        "custid": "C41",
        "order_date": "2020-04-29",
        "ship_date": "2020-05-03",
        "randomkey" : "randomval",
        "items": [
            {
                "itemno": 347,
                "qty": 5,
                "price": 19.99
            },
            {
                "itemno": 193,
                "qty": 2,
                "price": 28.89
            }
        ]
    }]);
{code}
Write the following query, having a select expression in the project phase as 
follows:
{code:java}
select
    (select element item from o.items as item) as result
from orders as o where orderno = 1001;
{code}
This yields correct results:
{code:java}
[
  {
    "itemno": 347,
    "qty": 5,
    "price": 19
  },
  {
    "itemno": 193,
    "qty": 2,
    "price": 28
  }
]
{code}
h2. *Bug*

However, change the scan source from `o.items` to `o.randomkey` as follows:
{code:java}
select
    (select element item from o.randomkey as item) as result
from orders as o where orderno = 1001;
{code}

h2. *Expected behaviour:*

Since value of o.randomkey is not a list or unordered list, the scan-collection 
operator should fail and return null value. Since there is nothing to add here, 
the listify operator should return an empty array Expected response:

{code}
{"result": []}
{code}

h2. *Actual behaviour:*

{code}
{
result: "randomvalue"
}
{code}

----

Observation:

We see here that the plan before optimisation is as follows where the subplan 
has scan-collection operator followed by listify:

{code}
distribute result [$$34] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- 
|UNPARTITIONED|
  project ([$$34]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- 
|UNPARTITIONED|
    assign [$$34] <- [{"result": $$33}] [cardinality: 0.0, op-cost: 0.0, 
total-cost: 0.0] -- |UNPARTITIONED|
      subplan {
                aggregate [$$33] <- [listify($$item)] [cardinality: 0.0, 
op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
                  unnest $$item <- scan-collection($$o.getField("randomkey")) 
[cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
                    nested tuple source [cardinality: 0.0, op-cost: 0.0, 
total-cost: 0.0] -- |UNPARTITIONED|
             } [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- 
|UNPARTITIONED|
        select (eq($$o.getField("orderno"), 1005)) [cardinality: 0.0, op-cost: 
0.0, total-cost: 0.0] -- |UNPARTITIONED|
          unnest $$o <- dataset("Default", "Commerce", "orders2", false) 
[cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
            empty-tuple-source [cardinality: 0.0, op-cost: 0.0, total-cost: 
0.0] -- |UNPARTITIONED|
{code}


Plan after optimisation:

The subplan is empty:

{code}
distribute result [$$34] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    assign [$$34] <- [{"result": $$37}] project: [$$34] [cardinality: 0.0, 
op-cost: 0.0, total-cost: 0.0]
    -- ASSIGN  |PARTITIONED|
      subplan {
                nested tuple source [cardinality: 0.0, op-cost: 0.0, 
total-cost: 0.0]
                -- NESTED_TUPLE_SOURCE  |LOCAL|
             } [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
      -- SUBPLAN  |PARTITIONED|
        assign [$$37] <- [$$o.getField("randomkey")] project: [$$37] 
[cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
        -- ASSIGN  |PARTITIONED|
          project ([$$o]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
          -- STREAM_PROJECT  |PARTITIONED|
            exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
              unnest-map [$$35, $$o] <- index-search("orders2", 0, "Default", 
"Commerce", "orders2", false, false, 1, $$38, 1, $$39, true, true, true) 
[cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
              -- BTREE_SEARCH  |PARTITIONED|
                exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  assign [$$38, $$39] <- [1005, 1005] [cardinality: 0.0, 
op-cost: 0.0, total-cost: 0.0]
                  -- ASSIGN  |PARTITIONED|
                    empty-tuple-source [cardinality: 0.0, op-cost: 0.0, 
total-cost: 0.0]
                    -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{code}

---

Worth noting that this disappearance in operators from subplan happen ONLY if 
the subquery projects the items from source without any transformations.

For subqueries that do perform transformations, such as :

{code}
select
    (select element item.x from o.randomkey as item ) as result
from orders2 as o where orderno = 1005;
{code}

The optimiser does not make any changes in the subplan:

{code}
subplan {
                  aggregate [$$35] <- [listify($$34)] [cardinality: 0.0, 
op-cost: 0.0, total-cost: 0.0]
                  -- AGGREGATE  |LOCAL|
                    assign [$$34] <- [$$item.getField("x")] [cardinality: 0.0, 
op-cost: 0.0, total-cost: 0.0]
                    -- ASSIGN  |LOCAL|
                      unnest $$item <- scan-collection($$39) [cardinality: 0.0, 
op-cost: 0.0, total-cost: 0.0]
                      -- UNNEST  |LOCAL|
                        nested tuple source [cardinality: 0.0, op-cost: 0.0, 
total-cost: 0.0]
                        -- NESTED_TUPLE_SOURCE  |LOCAL|
               }
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to