[jira] [Comment Edited] (ASTERIXDB-3037) Array indexing not being used when it seems applicable

Glenn Justo Galvizo (Jira) Fri, 27 May 2022 18:49:08 -0700


    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543363#comment-17543363
 ]


Glenn Justo Galvizo edited comment on ASTERIXDB-3037 at 5/28/22 1:48 AM:
-------------------------------------------------------------------------

I agree, we should support this use-case! But we are currently limited in a few 
areas. At its core, the existential quantification queries above are join 
queries – so we are limited by a) the mandate to cast the probe value to the 
type of the index, b) the need to include the INLJ hint, and c) the join 
ordering of each dataset. The last point (the join ordering) is the most 
problematic here with the use of LET-CLAUSEs, as these will end up to the right 
of the index dataset and thus miss out on index recognition.

All that being said, our array index isn't entirely useless for the queries 
above. Here's one way we can rewrite the last existential quantification above 
to use the array index:

 
{code:java}
USE HierarchyDemo;
SELECT DISTINCT VALUE e
FROM ( SELECT VALUE tt FROM employees tt WHERE tt.name = "Keshav" LIMIT 1 ) t, 
     employees e
WHERE TO_BIGINT(t.eno) /* +indexnl */ IN e.bosses;{code}
The probe dataset is on the left, and the indexed dataset is on the right.

 


was (Author: ggalvizo):
I agree, we should support this use-case! But we are currently limited in a few 
areas. At its core, the existential quantification queries above are join 
queries – so we are limited by a) the mandate to cast the probe value to the 
type of the index, b) the need to include the INLJ hint, and c) the join 
ordering of each dataset. The last point (the join ordering) is the most 
problematic here with the use of LET-CLAUSEs, as these will end up to the right 
of the index dataset and thus miss out on index recognition.

All that being said, our array index isn't entirely useless for the queries 
above. Here's one way we can rewrite the last existential quantification above 
to use the array index:

 
{code:java}
USE HierarchyDemo;

SELECT VALUE e
FROM employees t, employees e
WHERE t.name = "Keshav" AND 
     TO_BIGINT(t.eno) /* +indexnl */ IN e.bosses; {code}
The probe dataset is on the left, and the indexed dataset is on the right.

 

> Array indexing not being used when it seems applicable
> ------------------------------------------------------
>
>                 Key: ASTERIXDB-3037
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3037
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler, IDX - Indexes, SQL - Translator SQL++
>    Affects Versions: 0.9.8
>            Reporter: Michael J. Carey
>            Assignee: Glenn Justo Galvizo
>            Priority: Major
>
> Why aren't array indexes useful for the queries below that have existential 
> predicates on the bosses array?  Seems like they should apply!
>  
> {noformat}
> DROP DATAVERSE HierarchyDemo IF EXISTS;
> CREATE DATAVERSE HierarchyDemo;
> USE HierarchyDemo;
> CREATE TYPE genericType AS { _id: uuid };
> CREATE COLLECTION employees(genericType)
>     PRIMARY KEY _id AUTOGENERATED;
> CREATE INDEX bossIdx ON employees (UNNEST bosses: INT) EXCLUDE UNKNOWN KEY;
> INSERT INTO employees (
> [
>    {
>      "eno":100, "name": "Matt", "sal": 950000,
>      "level": 1,
>      "boss": null,
>      "bosses": []
>    },
>    {
>      "eno":200, "name": "Gopi", "sal": 750000,
>      "level": 2,
>      "boss": 100,
>      "bosses": [100]
>    },
>    {
>      "eno":75, "name": "Mike", "sal": 50000,
>      "level": 3,
>      "boss": 200,
>      "bosses": [100,200]
>    },
>    {
>      "eno":50, "name": "Keshav", "sal": 500000,
>      "level": 3,
>      "boss": 200,
>      "bosses": [100,200]
>    },
>    {
>      "eno":80, "name": "Till", "sal": 400000,
>      "level": 4,
>      "boss": 50,
>      "bosses": [100,200,50]
>    },
>    {
>      "eno":130, "name": "Kamini", "sal": 450000,
>      "level": 4,
>      "boss": 50,
>      "bosses": [100,200,50]
>    },
>    {
>      "eno":1000, "name": "Dmitry", "sal": 300000,
>      "level": 5,
>      "boss": 80,
>      "bosses": [100,200,50,80]
>    },
>    {
>      "eno":300, "name": "Murali", "sal": 275000,
>      "level": 5,
>      "boss": 80,
>      "bosses": [100,200,50,80]
>    },
>    {
>      "eno":70, "name": "Sitaram", "sal": 300000,
>      "level": 5,
>      "boss": 130,
>      "bosses": [100,200,50,130]
>    }
> ]);
> -- Show all employees
> SELECT VALUE e FROM employees e;
> -- Show Keshav's boss
> SELECT VALUE b
> FROM employees e, employees b
> WHERE e.boss = b.eno
> AND e.name = "Keshav";
> -- Show all of Keshav's bosses (ordered by level)
> LET kbossas = (SELECT VALUE k FROM employees k WHERE k.name = 
> "Keshav")[0].bosses
> SELECT VALUE e FROM employees e
> WHERE e.eno IN kbossas
> ORDER BY e.level DESC;
> -- Show Keshav's direct reports
> SELECT VALUE r
> FROM employees e, employees r
> WHERE r.boss = e.eno
> AND e.name = "Keshav";
> -- Show all of Keshav's reports
> LET keshav = (SELECT VALUE t.eno FROM employees t WHERE t.name = "Keshav")[0]
> SELECT VALUE e
> FROM employees e
> WHERE (SOME b IN e.bosses SATISFIES b = keshav);
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (ASTERIXDB-3037) Array indexing not being used when it seems applicable

Reply via email to