[ 
https://issues.apache.org/jira/browse/ASTERIXDB-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wail Y. Alkowaileet resolved ASTERIXDB-3105.
--------------------------------------------
    Resolution: Fixed

> LIMIT is not pushed to data-scan in views with generated uuid-primary-keys
> --------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-3105
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3105
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Wail Y. Alkowaileet
>            Assignee: Wail Y. Alkowaileet
>            Priority: Major
>
> When creating views on external datasets, a primary key is needed to make 
> analytical tools like Tableau happy. However, the data may not contain a 
> unique field to act as a primary key. Thus, users may resort to use uuid() to 
> act as a primary key by creating a view (e.g., on {*}MyDataset{*}) as follows:
>  
> {code:java}USE test;
> CREATE OR REPLACE VIEW myView (
>    rowkey string,
>    name string
> )
> DEFAULT NULL
> PRIMARY KEY(rowkey) NOT ENFORCED
> AS
> SELECT rowkey, name
> From MyDataset
> LET rowkey = uuid(){code}
>  
> The issue with using such a primary key is that LIMIT in queries such as:
> {code:java}USE test;
> EXPLAIN
> SELECT VALUE mv
> FROM myView mv
> LIMIT 1{code}
> will not be pushed down – making such queries quite slow (see the plan below)
> Plan:
> {code:java}distribute result [$$55]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
>     limit 1
>     -- STREAM_LIMIT  |UNPARTITIONED|
>       project ([$$55])
>       -- STREAM_PROJECT  |PARTITIONED|
>         exchange
>         -- SORT_MERGE_EXCHANGE [$$57(ASC) ]  |PARTITIONED|
>           project ([$$57, $$55])
>           -- STREAM_PROJECT  |PARTITIONED|
>             assign [$$55] <- [{\"rowkey\": $$63, \"name\": 
> string-default-null($$MyDataset.getField(\"name\"))}]
>             -- ASSIGN  |PARTITIONED|
>               limit 1
>               -- STREAM_LIMIT  |PARTITIONED|
>                 select (not(is-null($$63)))
>                 -- STREAM_SELECT  |PARTITIONED|
>                   project ([$$57, $$MyDataset, $$63])
>                   -- STREAM_PROJECT  |PARTITIONED|
>                     assign [$$63] <- [string-default-null($$rowkey)]
>                     -- ASSIGN  |PARTITIONED|
>                       assign [$$rowkey] <- [uuid()]
>                       -- ASSIGN  |PARTITIONED|
>                         exchange
>                         -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                           data-scan []<-[$$57, $$MyDataset] <- test.MyDataset
>                           -- DATASOURCE_SCAN  |PARTITIONED|
>                             exchange
>                             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                               empty-tuple-source {code}
> Clearly, *LIMIT 1* was not pushed to the data-scan as uuid() needs to 
> "precede" the call to {*}uuid(){*}, which is required to be checked as in
> {code:java}                select (not(is-null($$63)))
>                 -- STREAM_SELECT  |PARTITIONED|
>                   project ([$$57, $$MyDataset, $$63])
>                   -- STREAM_PROJECT  |PARTITIONED|
>                     assign [$$63] <- [string-default-null($$rowkey)]
>                     -- ASSIGN  |PARTITIONED|
>                       assign [$$rowkey] <- [uuid()]
>                       -- ASSIGN  |PARTITIONED|{code}
> However, the check select (not(is-null($$63)) is useless here as it is 
> impossible that uuid() would generate MISSING or NULL. Removing this check 
> would allow the compiler to pushdown the limit to data-scan.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to