[jira] [Updated] (IGNITE-21277) Sql. Partition pruning. Port partitionExtractor from AI2.

Yury Gerzhedovich (Jira) Fri, 19 Jan 2024 02:39:05 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-21277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yury Gerzhedovich updated IGNITE-21277:
---------------------------------------
    Description: 
In order to prune unnecessary partitions we need to obtain information that 
includes possible "values" of colocation key columns from filter expressions 
for every scan operators prior to statement execution. This can be accomplished 
by traversing an expression tree of scan's filter and collecting expressions 
with colocation key columns (this data is called partition pruning metadata for 
simplicity).

1. Implement a component that takes a physical plan and analyses filter 
expressions of every scan operator and creates (if possible) an expression that 
includes all colocated columns. (The PartitionExtractor from patch 
(https://github.com/apache/ignite/pull/10928/files) can be used a reference 
implementation).

Expression types to analyze:
 * AND
 * EQUALS
 * IS_FALSE
 * IS_NOT_DISTINCT_FROM
 * IS_NOT_FALSE
 * IS_NOT_TRUE
 * IS_TRUE
 * NOT
 * OR
 * SEARCH (operation that tests whether a value is included in a certain range)

2. Update QueryPlan to include partition pruning metadata for every scan 
operator (source_id = <partition pruning metadata>).

Basic examples:

{code:java}
Statement: 
SELECT * FROM t WHERE pk = 7 OR pk = 42

Partition metadata: 
t's source_id = [pk=7 || pk = 42] // Assuming colocation key is equal to 
primary key, || denotes OR operation 

Statement: 
SELECT * FROM t WHERE pk = 7 OR col1 = 1
Partition metadata: [] // Empty, because col1 is not part of a colocation key.

Statement: 
SELECT * FROM t_colo_key1_colo_key2 WHERE colo_key1= 42
Partition metadata: [] // Empty, because colo_key2 is missing 
{code}

—

*Additional examples - partition pruning is possible*

Dynamic parameters:

{code:java}
SELECT * FROM t WHERE pk = ?1 
Partition pruning metadata: t = [ pk = ?1 ]
{code}

Colocation columns reside inside a nested expression:

{code:java}
SELECT * FROM t WHERE col1 = col2 AND (col2 = 100 AND pk = 2) 
Partition pruning metadata: t = [ pk = 2 ]
{code}

Multiple keys:

{code:java}
SELECT * FROM t WHERE col_c1 = 1 AND col_c2 = 2 
Partition pruning metadata:  t = [ (col_c1 = 1, col_c2 = 2) ]
{code}

Complex expression with multiple keys:

{code:java}
SELECT * FROM t WHERE (col_col1 = 100 and col_col2 = 4) OR (col_col1 = 4 and 
col_col2 = 100)
Partition pruning metadata: t = [ (col_col1 = 100, col_col2 = 4) || (col_col1 = 
4, col_col2 = 100) ]
{code}

Multiple tables, assuming that filter b_id = 42 is pushed into scan b, because 
a_id = b_id:

{code:java}
SELECT * FROM a JOIN b WHERE a_id = b_id AND a_id = 42 
Partition pruning metadata: a= [ a_id=42 ], b=[ b_id=42 ]
{code}

---

*Additional examples - partition pruning is not possible*

Columns named col* are not part of colocation key:

{code:java}
SELECT * FROM t WHERE col1 = 10 
// Filter does not use colocation key columns.
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col1 = col2 OR pk = 42 
// We need to scan all partitions to figure out which tuples have ‘col1 = col2’
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col_col1 = 10 AND col_col2 OR col_col1 = 42
// Although first expression uses all colocation key columns the second one 
only uses some.
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col_c1 = 1 OR col_c2 = 2 
// We need to scan all partitions to figure out which tuples have col_c1 = 1 OR 
col_c2 = 2.
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col_col1 = col_col2 OR col_col2 = 42 
// We need to scan all partitions to figure out which tuples have ‘col_col1 = 
col_col2’
Partition pruning metadata: [] 
{code}



  was:
In order to prune unnecessary partitions we need to obtain information that 
includes possible "values" of colocation key columns from filter expressions 
for every scan operators prior to statement execution. This can be accomplished 
by traversing an expression tree of scan's filter and collecting expressions 
with colocation key columns (this data is called partition pruning metadata for 
simplicity).

1. Implement a component that takes a physical plan and analyses filter 
expressions of every scan operator and creates (if possible) an expression that 
includes all colocated columns. (The PartitionExtractor from patch 
(https://github.com/apache/ignite/pull/10928/files) can be used a reference 
implementation).

Expression types to analyze:
 * AND
 * EQUALS
 * IS_FALSE
 * IS_NOT_DISTINCT_FROM
 * IS_NOT_FALSE
 * IS_NOT_TRUE
 * IS_TRUE
 * NOT
 * OR
 * SEARCH (operation that tests whether a value is included in a certain range)

2. Update QueryPlan to include partition pruning metadata for every scan 
operator (source_id = <partition pruning metadata>).

Basic examples:

{code:java}
Statement: 
SELECT * FROM t WHERE pk = 7 OR pk = 42

Partition metadata: 
t's source_id = [pk=7 || pk = 42] // Assuming colocation key is equal to 
primary key, || denotes OR operation 

Statement: 
SELECT * FROM t WHERE pk = 7 OR col1 = 1
Partition metadata: [] // Empty, because col1 is not part of a colocation key.

Statement: 
SELECT * FROM t_colo_key1_colo_key2 WHERE colo_key1= 42
Partition metadata: [] // Empty, because colo_key2 is missing 
{code}

—

*Additional examples - partition pruning is possible*

Dynamic parameters:

{code:java}
SELECT * FROM t WHERE pk = ?1 
Partition pruning metadata: t = [ pk = ?1 ]
{code}

Colocation columns reside inside a nested expression:

{code:java}
SELECT * FROM t WHERE col1 = col2 AND (col2 = 100 AND pk = 2) 
Partition pruning metadata: t = [ pk = 2 ]
{code}

Multiple keys:

{code:java}
SELECT * FROM t WHERE col_c1 = 1 AND col_c2 = 2 
Partition pruning metadata:  t = [ (col_c1 = 1, col_c2 = 2) ]
{code}

Complex expression with multiple keys:

{code:java}
SELECT * FROM t WHERE (col_col1 = 100 and col_col2 = 4) OR (col_col1 = 4 and 
col_col2 = 100)
Partition pruning metadata: t = [ (col_col1 = 100, col_col2 = 4) || (col_col1 = 
4, col_col2 = 100) ]
{code}

Multiple tables, assuming that filter b_id = 42 is pushed into scan b, because 
a_id = b_id:

{code:java}
SELECT * FROM a JOIN b WHERE a_id = b_id AND a_id = 42 
Partition pruning metadata: a= [ a_id=42 ], b=[ b_id=42 ]
{code}

---

*Additional examples - partition pruning is not possible*

Columns named col* are not part of colocation key:

{code:java}
SELECT * FROM t WHERE col1 = 10 
// Filter does not use colocation key columns.
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col1 = col2 OR pk = 42 
// We need to scan all partitions to figure out which tuples have ‘col1 = col2’
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col_col1 = 10 AND col_col2 OR col_col1 = 42
// Although first expression uses all colocation key columns the second one 
only uses some.
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col_c1 = 1 OR col_c2 = 2 
// We need to scan all partitions to figure out which tuples have col_c1 = 1 OR 
col_c2 = 2.
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col_col1 = col_col2 OR col_col2 = 42 
// We need to scan all partitions to figure out which tuples have ‘col_col1 = 
col_col2’
Partition pruning metadata: [] 
{code}



> Sql. Partition pruning. Port partitionExtractor from AI2.
> ---------------------------------------------------------
>
>                 Key: IGNITE-21277
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21277
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>    Affects Versions: 3.0.0-beta2
>            Reporter: Maksim Zhuravkov
>            Priority: Major
>              Labels: ignite-3
>
> In order to prune unnecessary partitions we need to obtain information that 
> includes possible "values" of colocation key columns from filter expressions 
> for every scan operators prior to statement execution. This can be 
> accomplished by traversing an expression tree of scan's filter and collecting 
> expressions with colocation key columns (this data is called partition 
> pruning metadata for simplicity).
> 1. Implement a component that takes a physical plan and analyses filter 
> expressions of every scan operator and creates (if possible) an expression 
> that includes all colocated columns. (The PartitionExtractor from patch 
> (https://github.com/apache/ignite/pull/10928/files) can be used a reference 
> implementation).
> Expression types to analyze:
>  * AND
>  * EQUALS
>  * IS_FALSE
>  * IS_NOT_DISTINCT_FROM
>  * IS_NOT_FALSE
>  * IS_NOT_TRUE
>  * IS_TRUE
>  * NOT
>  * OR
>  * SEARCH (operation that tests whether a value is included in a certain 
> range)
> 2. Update QueryPlan to include partition pruning metadata for every scan 
> operator (source_id = <partition pruning metadata>).
> Basic examples:
> {code:java}
> Statement: 
> SELECT * FROM t WHERE pk = 7 OR pk = 42
> Partition metadata: 
> t's source_id = [pk=7 || pk = 42] // Assuming colocation key is equal to 
> primary key, || denotes OR operation 
> Statement: 
> SELECT * FROM t WHERE pk = 7 OR col1 = 1
> Partition metadata: [] // Empty, because col1 is not part of a colocation key.
> Statement: 
> SELECT * FROM t_colo_key1_colo_key2 WHERE colo_key1= 42
> Partition metadata: [] // Empty, because colo_key2 is missing 
> {code}
> —
> *Additional examples - partition pruning is possible*
> Dynamic parameters:
> {code:java}
> SELECT * FROM t WHERE pk = ?1 
> Partition pruning metadata: t = [ pk = ?1 ]
> {code}
> Colocation columns reside inside a nested expression:
> {code:java}
> SELECT * FROM t WHERE col1 = col2 AND (col2 = 100 AND pk = 2) 
> Partition pruning metadata: t = [ pk = 2 ]
> {code}
> Multiple keys:
> {code:java}
> SELECT * FROM t WHERE col_c1 = 1 AND col_c2 = 2 
> Partition pruning metadata:  t = [ (col_c1 = 1, col_c2 = 2) ]
> {code}
> Complex expression with multiple keys:
> {code:java}
> SELECT * FROM t WHERE (col_col1 = 100 and col_col2 = 4) OR (col_col1 = 4 and 
> col_col2 = 100)
> Partition pruning metadata: t = [ (col_col1 = 100, col_col2 = 4) || (col_col1 
> = 4, col_col2 = 100) ]
> {code}
> Multiple tables, assuming that filter b_id = 42 is pushed into scan b, 
> because a_id = b_id:
> {code:java}
> SELECT * FROM a JOIN b WHERE a_id = b_id AND a_id = 42 
> Partition pruning metadata: a= [ a_id=42 ], b=[ b_id=42 ]
> {code}
> ---
> *Additional examples - partition pruning is not possible*
> Columns named col* are not part of colocation key:
> {code:java}
> SELECT * FROM t WHERE col1 = 10 
> // Filter does not use colocation key columns.
> Partition pruning metadata: [] 
> {code}
> {code:java}
> SELECT * FROM t WHERE col1 = col2 OR pk = 42 
> // We need to scan all partitions to figure out which tuples have ‘col1 = 
> col2’
> Partition pruning metadata: [] 
> {code}
> {code:java}
> SELECT * FROM t WHERE col_col1 = 10 AND col_col2 OR col_col1 = 42
> // Although first expression uses all colocation key columns the second one 
> only uses some.
> Partition pruning metadata: [] 
> {code}
> {code:java}
> SELECT * FROM t WHERE col_c1 = 1 OR col_c2 = 2 
> // We need to scan all partitions to figure out which tuples have col_c1 = 1 
> OR col_c2 = 2.
> Partition pruning metadata: [] 
> {code}
> {code:java}
> SELECT * FROM t WHERE col_col1 = col_col2 OR col_col2 = 42 
> // We need to scan all partitions to figure out which tuples have ‘col_col1 = 
> col_col2’
> Partition pruning metadata: [] 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-21277) Sql. Partition pruning. Port partitionExtractor from AI2.

Reply via email to