[
https://issues.apache.org/jira/browse/ASTERIXDB-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wail Y. Alkowaileet resolved ASTERIXDB-3105.
--------------------------------------------
Resolution: Fixed
> LIMIT is not pushed to data-scan in views with generated uuid-primary-keys
> --------------------------------------------------------------------------
>
> Key: ASTERIXDB-3105
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3105
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Wail Y. Alkowaileet
> Assignee: Wail Y. Alkowaileet
> Priority: Major
>
> When creating views on external datasets, a primary key is needed to make
> analytical tools like Tableau happy. However, the data may not contain a
> unique field to act as a primary key. Thus, users may resort to use uuid() to
> act as a primary key by creating a view (e.g., on {*}MyDataset{*}) as follows:
>
> {code:java}USE test;
> CREATE OR REPLACE VIEW myView (
> rowkey string,
> name string
> )
> DEFAULT NULL
> PRIMARY KEY(rowkey) NOT ENFORCED
> AS
> SELECT rowkey, name
> From MyDataset
> LET rowkey = uuid(){code}
>
> The issue with using such a primary key is that LIMIT in queries such as:
> {code:java}USE test;
> EXPLAIN
> SELECT VALUE mv
> FROM myView mv
> LIMIT 1{code}
> will not be pushed down – making such queries quite slow (see the plan below)
> Plan:
> {code:java}distribute result [$$55]
> -- DISTRIBUTE_RESULT |UNPARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED|
> limit 1
> -- STREAM_LIMIT |UNPARTITIONED|
> project ([$$55])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- SORT_MERGE_EXCHANGE [$$57(ASC) ] |PARTITIONED|
> project ([$$57, $$55])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$55] <- [{\"rowkey\": $$63, \"name\":
> string-default-null($$MyDataset.getField(\"name\"))}]
> -- ASSIGN |PARTITIONED|
> limit 1
> -- STREAM_LIMIT |PARTITIONED|
> select (not(is-null($$63)))
> -- STREAM_SELECT |PARTITIONED|
> project ([$$57, $$MyDataset, $$63])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$63] <- [string-default-null($$rowkey)]
> -- ASSIGN |PARTITIONED|
> assign [$$rowkey] <- [uuid()]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$57, $$MyDataset] <- test.MyDataset
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source {code}
> Clearly, *LIMIT 1* was not pushed to the data-scan as uuid() needs to
> "precede" the call to {*}uuid(){*}, which is required to be checked as in
> {code:java} select (not(is-null($$63)))
> -- STREAM_SELECT |PARTITIONED|
> project ([$$57, $$MyDataset, $$63])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$63] <- [string-default-null($$rowkey)]
> -- ASSIGN |PARTITIONED|
> assign [$$rowkey] <- [uuid()]
> -- ASSIGN |PARTITIONED|{code}
> However, the check select (not(is-null($$63)) is useless here as it is
> impossible that uuid() would generate MISSING or NULL. Removing this check
> would allow the compiler to pushdown the limit to data-scan.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)