Abhishek Jindal created ASTERIXDB-3533:
------------------------------------------
Summary: Incorrect behaviour of passthrough nested SELECT
expression
Key: ASTERIXDB-3533
URL: https://issues.apache.org/jira/browse/ASTERIXDB-3533
Project: Apache AsterixDB
Issue Type: Bug
Components: *DB - AsterixDB, COMP - Compiler
Affects Versions: 0.9.10
Reporter: Abhishek Jindal
h2. *Steps to reproduce:*
Create a datatype `orderType` as follows:
{code:java}
CREATE DATAVERSE Commerce IF NOT EXISTS;
USE Commerce;
CREATE TYPE orderType IF NOT EXISTS AS {
orderno: int,
custid: string,
ship_date: string?,
order_date: string,
items: [ itemType ]
};
CREATE DATASET orders(orderType) IF NOT EXISTS
PRIMARY KEY orderno;
{code}
Upsert a document to the `orders` dataset:
{code:java}
UPSERT INTO orders2([
{
"orderno": 1001,
"custid": "C41",
"order_date": "2020-04-29",
"ship_date": "2020-05-03",
"randomkey" : "randomval",
"items": [
{
"itemno": 347,
"qty": 5,
"price": 19.99
},
{
"itemno": 193,
"qty": 2,
"price": 28.89
}
]
}]);
{code}
Write the following query, having a select expression in the project phase as
follows:
{code:java}
select
(select element item from o.items as item) as result
from orders as o where orderno = 1001;
{code}
This yields correct results:
{code:java}
[
{
"itemno": 347,
"qty": 5,
"price": 19
},
{
"itemno": 193,
"qty": 2,
"price": 28
}
]
{code}
h2. *Bug*
However, change the scan source from `o.items` to `o.randomkey` as follows:
{code:java}
select
(select element item from o.randomkey as item) as result
from orders as o where orderno = 1001;
{code}
h2. *Expected behaviour:*
Since value of o.randomkey is not a list or unordered list, the scan-collection
operator should fail and return null value. Since there is nothing to add here,
the listify operator should return an empty array Expected response:
{code}
{"result": []}
{code}
h2. *Actual behaviour:*
{code}
{
result: "randomvalue"
}
{code}
----
Observation:
We see here that the plan before optimisation is as follows where the subplan
has scan-collection operator followed by listify:
{code}
distribute result [$$34] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] --
|UNPARTITIONED|
project ([$$34]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] --
|UNPARTITIONED|
assign [$$34] <- [{"result": $$33}] [cardinality: 0.0, op-cost: 0.0,
total-cost: 0.0] -- |UNPARTITIONED|
subplan {
aggregate [$$33] <- [listify($$item)] [cardinality: 0.0,
op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
unnest $$item <- scan-collection($$o.getField("randomkey"))
[cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
nested tuple source [cardinality: 0.0, op-cost: 0.0,
total-cost: 0.0] -- |UNPARTITIONED|
} [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] --
|UNPARTITIONED|
select (eq($$o.getField("orderno"), 1005)) [cardinality: 0.0, op-cost:
0.0, total-cost: 0.0] -- |UNPARTITIONED|
unnest $$o <- dataset("Default", "Commerce", "orders2", false)
[cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
empty-tuple-source [cardinality: 0.0, op-cost: 0.0, total-cost:
0.0] -- |UNPARTITIONED|
{code}
Plan after optimisation:
The subplan is empty:
{code}
distribute result [$$34] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
assign [$$34] <- [{"result": $$37}] project: [$$34] [cardinality: 0.0,
op-cost: 0.0, total-cost: 0.0]
-- ASSIGN |PARTITIONED|
subplan {
nested tuple source [cardinality: 0.0, op-cost: 0.0,
total-cost: 0.0]
-- NESTED_TUPLE_SOURCE |LOCAL|
} [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
-- SUBPLAN |PARTITIONED|
assign [$$37] <- [$$o.getField("randomkey")] project: [$$37]
[cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
-- ASSIGN |PARTITIONED|
project ([$$o]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
-- STREAM_PROJECT |PARTITIONED|
exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$35, $$o] <- index-search("orders2", 0, "Default",
"Commerce", "orders2", false, false, 1, $$38, 1, $$39, true, true, true)
[cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
-- BTREE_SEARCH |PARTITIONED|
exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
assign [$$38, $$39] <- [1005, 1005] [cardinality: 0.0,
op-cost: 0.0, total-cost: 0.0]
-- ASSIGN |PARTITIONED|
empty-tuple-source [cardinality: 0.0, op-cost: 0.0,
total-cost: 0.0]
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
{code}
---
Worth noting that this disappearance in operators from subplan happen ONLY if
the subquery projects the items from source without any transformations.
For subqueries that do perform transformations, such as :
{code}
select
(select element item.x from o.randomkey as item ) as result
from orders2 as o where orderno = 1005;
{code}
The optimiser does not make any changes in the subplan:
{code}
subplan {
aggregate [$$35] <- [listify($$34)] [cardinality: 0.0,
op-cost: 0.0, total-cost: 0.0]
-- AGGREGATE |LOCAL|
assign [$$34] <- [$$item.getField("x")] [cardinality: 0.0,
op-cost: 0.0, total-cost: 0.0]
-- ASSIGN |LOCAL|
unnest $$item <- scan-collection($$39) [cardinality: 0.0,
op-cost: 0.0, total-cost: 0.0]
-- UNNEST |LOCAL|
nested tuple source [cardinality: 0.0, op-cost: 0.0,
total-cost: 0.0]
-- NESTED_TUPLE_SOURCE |LOCAL|
}
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)