[
https://issues.apache.org/jira/browse/ASTERIXDB-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693310#comment-17693310
]
ASF subversion and git services commented on ASTERIXDB-3105:
------------------------------------------------------------
Commit 92890381c9f3ff40f5d035abe5db46cffdac4978 in asterixdb's branch
refs/heads/master from Wail Alkowaileet
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=92890381c9 ]
[ASTERIXDB-3105][COMP] Fix limit pushdown in views with uuid() keys
- user model changes: no
- storage format changes: no
- interface changes: no
Details:
LIMIT (with and without SELECTs) are not pushed down
when a uuid() is used as a PK in views. The reason
is that queries against views create (sometimes)
unnecessary is-null checks (e.g., on a call to uuid()).
Change-Id: I717fdfca8d09144443b1f49c5588defeb895766e
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17303
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
> LIMIT is not pushed to data-scan in views with generated uuid-primary-keys
> --------------------------------------------------------------------------
>
> Key: ASTERIXDB-3105
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3105
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Wail Y. Alkowaileet
> Assignee: Wail Y. Alkowaileet
> Priority: Major
>
> When creating views on external datasets, a primary key is needed to make
> analytical tools like Tableau happy. However, the data may not contain a
> unique field to act as a primary key. Thus, users may resort to use uuid() to
> act as a primary key by creating a view (e.g., on {*}MyDataset{*}) as follows:
>
> {code:java}USE test;
> CREATE OR REPLACE VIEW myView (
> rowkey string,
> name string
> )
> DEFAULT NULL
> PRIMARY KEY(rowkey) NOT ENFORCED
> AS
> SELECT rowkey, name
> From MyDataset
> LET rowkey = uuid(){code}
>
> The issue with using such a primary key is that LIMIT in queries such as:
> {code:java}USE test;
> EXPLAIN
> SELECT VALUE mv
> FROM myView mv
> LIMIT 1{code}
> will not be pushed down – making such queries quite slow (see the plan below)
> Plan:
> {code:java}distribute result [$$55]
> -- DISTRIBUTE_RESULT |UNPARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED|
> limit 1
> -- STREAM_LIMIT |UNPARTITIONED|
> project ([$$55])
> -- STREAM_PROJECT |PARTITIONED|
> exchange
> -- SORT_MERGE_EXCHANGE [$$57(ASC) ] |PARTITIONED|
> project ([$$57, $$55])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$55] <- [{\"rowkey\": $$63, \"name\":
> string-default-null($$MyDataset.getField(\"name\"))}]
> -- ASSIGN |PARTITIONED|
> limit 1
> -- STREAM_LIMIT |PARTITIONED|
> select (not(is-null($$63)))
> -- STREAM_SELECT |PARTITIONED|
> project ([$$57, $$MyDataset, $$63])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$63] <- [string-default-null($$rowkey)]
> -- ASSIGN |PARTITIONED|
> assign [$$rowkey] <- [uuid()]
> -- ASSIGN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> data-scan []<-[$$57, $$MyDataset] <- test.MyDataset
> -- DATASOURCE_SCAN |PARTITIONED|
> exchange
> -- ONE_TO_ONE_EXCHANGE |PARTITIONED|
> empty-tuple-source {code}
> Clearly, *LIMIT 1* was not pushed to the data-scan as uuid() needs to
> "precede" the call to {*}uuid(){*}, which is required to be checked as in
> {code:java} select (not(is-null($$63)))
> -- STREAM_SELECT |PARTITIONED|
> project ([$$57, $$MyDataset, $$63])
> -- STREAM_PROJECT |PARTITIONED|
> assign [$$63] <- [string-default-null($$rowkey)]
> -- ASSIGN |PARTITIONED|
> assign [$$rowkey] <- [uuid()]
> -- ASSIGN |PARTITIONED|{code}
> However, the check select (not(is-null($$63)) is useless here as it is
> impossible that uuid() would generate MISSING or NULL. Removing this check
> would allow the compiler to pushdown the limit to data-scan.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)