[ 
https://issues.apache.org/jira/browse/ASTERIXDB-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693310#comment-17693310
 ] 

ASF subversion and git services commented on ASTERIXDB-3105:
------------------------------------------------------------

Commit 92890381c9f3ff40f5d035abe5db46cffdac4978 in asterixdb's branch 
refs/heads/master from Wail Alkowaileet
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=92890381c9 ]

[ASTERIXDB-3105][COMP] Fix limit pushdown in views with uuid() keys

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
LIMIT (with and without SELECTs) are not pushed down
when a uuid() is used as a PK in views. The reason
is that queries against views create (sometimes)
unnecessary is-null checks (e.g., on a call to uuid()).

Change-Id: I717fdfca8d09144443b1f49c5588defeb895766e
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17303
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>


> LIMIT is not pushed to data-scan in views with generated uuid-primary-keys
> --------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-3105
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3105
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Wail Y. Alkowaileet
>            Assignee: Wail Y. Alkowaileet
>            Priority: Major
>
> When creating views on external datasets, a primary key is needed to make 
> analytical tools like Tableau happy. However, the data may not contain a 
> unique field to act as a primary key. Thus, users may resort to use uuid() to 
> act as a primary key by creating a view (e.g., on {*}MyDataset{*}) as follows:
>  
> {code:java}USE test;
> CREATE OR REPLACE VIEW myView (
>    rowkey string,
>    name string
> )
> DEFAULT NULL
> PRIMARY KEY(rowkey) NOT ENFORCED
> AS
> SELECT rowkey, name
> From MyDataset
> LET rowkey = uuid(){code}
>  
> The issue with using such a primary key is that LIMIT in queries such as:
> {code:java}USE test;
> EXPLAIN
> SELECT VALUE mv
> FROM myView mv
> LIMIT 1{code}
> will not be pushed down – making such queries quite slow (see the plan below)
> Plan:
> {code:java}distribute result [$$55]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
>     limit 1
>     -- STREAM_LIMIT  |UNPARTITIONED|
>       project ([$$55])
>       -- STREAM_PROJECT  |PARTITIONED|
>         exchange
>         -- SORT_MERGE_EXCHANGE [$$57(ASC) ]  |PARTITIONED|
>           project ([$$57, $$55])
>           -- STREAM_PROJECT  |PARTITIONED|
>             assign [$$55] <- [{\"rowkey\": $$63, \"name\": 
> string-default-null($$MyDataset.getField(\"name\"))}]
>             -- ASSIGN  |PARTITIONED|
>               limit 1
>               -- STREAM_LIMIT  |PARTITIONED|
>                 select (not(is-null($$63)))
>                 -- STREAM_SELECT  |PARTITIONED|
>                   project ([$$57, $$MyDataset, $$63])
>                   -- STREAM_PROJECT  |PARTITIONED|
>                     assign [$$63] <- [string-default-null($$rowkey)]
>                     -- ASSIGN  |PARTITIONED|
>                       assign [$$rowkey] <- [uuid()]
>                       -- ASSIGN  |PARTITIONED|
>                         exchange
>                         -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                           data-scan []<-[$$57, $$MyDataset] <- test.MyDataset
>                           -- DATASOURCE_SCAN  |PARTITIONED|
>                             exchange
>                             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                               empty-tuple-source {code}
> Clearly, *LIMIT 1* was not pushed to the data-scan as uuid() needs to 
> "precede" the call to {*}uuid(){*}, which is required to be checked as in
> {code:java}                select (not(is-null($$63)))
>                 -- STREAM_SELECT  |PARTITIONED|
>                   project ([$$57, $$MyDataset, $$63])
>                   -- STREAM_PROJECT  |PARTITIONED|
>                     assign [$$63] <- [string-default-null($$rowkey)]
>                     -- ASSIGN  |PARTITIONED|
>                       assign [$$rowkey] <- [uuid()]
>                       -- ASSIGN  |PARTITIONED|{code}
> However, the check select (not(is-null($$63)) is useless here as it is 
> impossible that uuid() would generate MISSING or NULL. Removing this check 
> would allow the compiler to pushdown the limit to data-scan.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to