[ 
https://issues.apache.org/jira/browse/IGNITE-28199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maksim Zhuravkov updated IGNITE-28199:
--------------------------------------
    Description: 
Current implementation of partition pruning (PP) collection algorithm does 
collect metadata for DML statements that reference multiple sources (see 
examples) or have nested queries. This is a limitation is result of current 
implementation of the algorithm that has two separate paths for traversing rel 
node trees - a path for queries (PartitionPruningMetadataExtractor is also a 
visitor) and a path for DMLs(ModifyNodeVisitor). The path for DMLs is very 
conservative and it rejects many valid cases. 

{noformat}
-- These statements have two sources each - a source for ModifyNode and another 
source for ScanNode, FunctionScan `breaks` traversal that collects metadata, so 
resulting metadata is absent:

--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- expected: t(DELETE)={id=1}, t(SELECT)={id=1}
DELETE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))

--- Does not capture metadata because it has a nested query:

--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}, t2=1,215 t2={id=42}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM t2 WHERE id = 42)
{noformat}

*Proposed solution*:
Implement unified bottom-up traversal that in addition to collecting metadata 
from ScanNodeS, propagates that up to ModifyNodeS.




  was:
Current implementation of partition pruning (PP) collection algorithm does 
collect metadata for DML statements that reference multiple sources (see 
examples) or have nested queries, and it has two separate paths for traversing 
rel node trees - a path for queries (PartitionPruningMetadataExtractor is also 
a visitor) and a path for DMLs(ModifyNodeVisitor). 

{noformat}
-- These statements have two sources each - a source for ModifyNode and another 
source for ScanNode, FunctionScan `breaks` traversal that collects metadata, so 
resulting metadata is absent:

--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- expected: t(DELETE)={id=1}, t(SELECT)={id=1}
DELETE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))

--- Does not capture metadata because it has a nested query:

--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}, t2=1,215 t2={id=42}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM t2 WHERE id = 42)
{noformat}

*Proposed solution*:
Implement unified bottom-up traversal that in addition to collecting metadata 
from ScanNodeS, propagates that up to ModifyNodeS.





> Sql. Partition pruning. Single bottom-up traversal for both queries and DMLs
> ----------------------------------------------------------------------------
>
>                 Key: IGNITE-28199
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28199
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql ai3
>            Reporter: Maksim Zhuravkov
>            Priority: Major
>              Labels: ignite-3
>
> Current implementation of partition pruning (PP) collection algorithm does 
> collect metadata for DML statements that reference multiple sources (see 
> examples) or have nested queries. This is a limitation is result of current 
> implementation of the algorithm that has two separate paths for traversing 
> rel node trees - a path for queries (PartitionPruningMetadataExtractor is 
> also a visitor) and a path for DMLs(ModifyNodeVisitor). The path for DMLs is 
> very conservative and it rejects many valid cases. 
> {noformat}
> -- These statements have two sources each - a source for ModifyNode and 
> another source for ScanNode, FunctionScan `breaks` traversal that collects 
> metadata, so resulting metadata is absent:
> --- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}
> UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
> --- expected: t(DELETE)={id=1}, t(SELECT)={id=1}
> DELETE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
> --- Does not capture metadata because it has a nested query:
> --- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}, t2=1,215 t2={id=42}
> UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM t2 WHERE id = 42)
> {noformat}
> *Proposed solution*:
> Implement unified bottom-up traversal that in addition to collecting metadata 
> from ScanNodeS, propagates that up to ModifyNodeS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to