[
https://issues.apache.org/jira/browse/ASTERIXDB-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Maxon updated ASTERIXDB-3533:
---------------------------------
Labels: triaged (was: )
> Incorrect behaviour of passthrough nested SELECT expression
> -----------------------------------------------------------
>
> Key: ASTERIXDB-3533
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3533
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: *DB - AsterixDB, COMP - Compiler
> Affects Versions: 0.9.10
> Reporter: Abhishek Jindal
> Priority: Major
> Labels: triaged
>
> h2. *Steps to reproduce:*
> Create a datatype `orderType` as follows:
> {code:java}
> CREATE DATAVERSE Commerce IF NOT EXISTS;
> USE Commerce;
> CREATE TYPE orderType IF NOT EXISTS AS {
> orderno: int,
> custid: string,
> ship_date: string?,
> order_date: string,
> items: [ itemType ]
> };
> CREATE DATASET orders(orderType) IF NOT EXISTS
> PRIMARY KEY orderno;
> {code}
> Upsert a document to the `orders` dataset:
> {code:java}
> UPSERT INTO orders2([
> {
> "orderno": 1001,
> "custid": "C41",
> "order_date": "2020-04-29",
> "ship_date": "2020-05-03",
> "randomkey" : "randomval",
> "items": [
> {
> "itemno": 347,
> "qty": 5,
> "price": 19.99
> },
> {
> "itemno": 193,
> "qty": 2,
> "price": 28.89
> }
> ]
> }]);
> {code}
> Write the following query, having a select expression in the project phase as
> follows:
> {code:java}
> select
> (select element item from o.items as item) as result
> from orders as o where orderno = 1001;
> {code}
> This yields correct results:
> {code:java}
> [
> {
> "itemno": 347,
> "qty": 5,
> "price": 19
> },
> {
> "itemno": 193,
> "qty": 2,
> "price": 28
> }
> ]
> {code}
> h2. *Bug*
> However, change the scan source from `o.items` to `o.randomkey` as follows:
> {code:java}
> select
> (select element item from o.randomkey as item) as result
> from orders as o where orderno = 1001;
> {code}
> h2. *Expected behaviour:*
> Since value of o.randomkey is not a list or unordered list, the
> scan-collection operator should fail and return null value. Since there is
> nothing to add here, the listify operator should return an empty array
> Expected response:
> {code}
> {"result": []}
> {code}
> h2. *Actual behaviour:*
> {code}
> {
> result: "randomvalue"
> }
> {code}
> ----
> Observation:
> We see here that the plan before optimisation is as follows where the subplan
> has scan-collection operator followed by listify:
> {code}
> distribute result [$$34] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] --
> |UNPARTITIONED|
> project ([$$34]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] --
> |UNPARTITIONED|
> assign [$$34] <- [{"result": $$33}] [cardinality: 0.0, op-cost: 0.0,
> total-cost: 0.0] -- |UNPARTITIONED|
> subplan {
> aggregate [$$33] <- [listify($$item)] [cardinality: 0.0,
> op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
> unnest $$item <- scan-collection($$o.getField("randomkey"))
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
> nested tuple source [cardinality: 0.0, op-cost: 0.0,
> total-cost: 0.0] -- |UNPARTITIONED|
> } [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] --
> |UNPARTITIONED|
> select (eq($$o.getField("orderno"), 1005)) [cardinality: 0.0,
> op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
> unnest $$o <- dataset("Default", "Commerce", "orders2", false)
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0] -- |UNPARTITIONED|
> empty-tuple-source [cardinality: 0.0, op-cost: 0.0, total-cost:
> 0.0] -- |UNPARTITIONED|
> {code}
> Plan after optimisation:
> The subplan is empty:
> {code}
> distribute result [$$34] [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- DISTRIBUTE_RESULT |PARTITIONED|
> exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> assign [$$34] <- [{"result": $$37}] project: [$$34] [cardinality: 0.0,
> op-cost: 0.0, total-cost: 0.0]
> -- ASSIGN |PARTITIONED|
> subplan {
> nested tuple source [cardinality: 0.0, op-cost: 0.0,
> total-cost: 0.0]
> -- NESTED_TUPLE_SOURCE |LOCAL|
> } [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- SUBPLAN |PARTITIONED|
> assign [$$37] <- [$$o.getField("randomkey")] project: [$$37]
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ASSIGN |PARTITIONED|
> project ([$$o]) [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- STREAM_PROJECT |PARTITIONED|
> exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> unnest-map [$$35, $$o] <- index-search("orders2", 0, "Default",
> "Commerce", "orders2", false, false, 1, $$38, 1, $$39, true, true, true)
> [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- BTREE_SEARCH |PARTITIONED|
> exchange [cardinality: 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> assign [$$38, $$39] <- [1005, 1005] [cardinality: 0.0,
> op-cost: 0.0, total-cost: 0.0]
> -- ASSIGN |PARTITIONED|
> empty-tuple-source [cardinality: 0.0, op-cost: 0.0,
> total-cost: 0.0]
> -- EMPTY_TUPLE_SOURCE |PARTITIONED|
> {code}
> ---
> Worth noting that this disappearance in operators from subplan happen ONLY if
> the subquery projects the items from source without any transformations.
> For subqueries that do perform transformations, such as :
> {code}
> select
> (select element item.x from o.randomkey as item ) as result
> from orders2 as o where orderno = 1005;
> {code}
> The optimiser does not make any changes in the subplan:
> {code}
> subplan {
> aggregate [$$35] <- [listify($$34)] [cardinality: 0.0,
> op-cost: 0.0, total-cost: 0.0]
> -- AGGREGATE |LOCAL|
> assign [$$34] <- [$$item.getField("x")] [cardinality:
> 0.0, op-cost: 0.0, total-cost: 0.0]
> -- ASSIGN |LOCAL|
> unnest $$item <- scan-collection($$39) [cardinality:
> 0.0, op-cost: 0.0, total-cost: 0.0]
> -- UNNEST |LOCAL|
> nested tuple source [cardinality: 0.0, op-cost: 0.0,
> total-cost: 0.0]
> -- NESTED_TUPLE_SOURCE |LOCAL|
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)