[
https://issues.apache.org/jira/browse/IGNITE-28199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maksim Zhuravkov updated IGNITE-28199:
--------------------------------------
Description:
Current implementation of partition pruning (PP) collection algorithm does
collect metadata for DML statements that reference multiple sources (see
examples) or have nested queries. This is a limitation is result of current
implementation of the algorithm that has two separate paths for traversing rel
node trees - a path for queries (PartitionPruningMetadataExtractor is also a
visitor) and a path for DMLs(ModifyNodeVisitor). The path for DMLs is very
conservative and it rejects many valid cases.
{noformat}
-- These statements have two sources each - a source for ModifyNode and another
source for ScanNode, FunctionScan `breaks` traversal that collects metadata, so
resulting metadata is absent:
--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- expected: t(DELETE)={id=1}, t(SELECT)={id=1}
DELETE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- Does not capture metadata because it has a nested query:
--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}, t2=1,215 t2={id=42}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM t2 WHERE id = 42)
{noformat}
*Proposed solution*:
Implement unified bottom-up traversal that in addition to collecting metadata
from ScanNodeS, propagates that up to ModifyNodeS.
was:
Current implementation of partition pruning (PP) collection algorithm does
collect metadata for DML statements that reference multiple sources (see
examples) or have nested queries, and it has two separate paths for traversing
rel node trees - a path for queries (PartitionPruningMetadataExtractor is also
a visitor) and a path for DMLs(ModifyNodeVisitor).
{noformat}
-- These statements have two sources each - a source for ModifyNode and another
source for ScanNode, FunctionScan `breaks` traversal that collects metadata, so
resulting metadata is absent:
--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- expected: t(DELETE)={id=1}, t(SELECT)={id=1}
DELETE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- Does not capture metadata because it has a nested query:
--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}, t2=1,215 t2={id=42}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM t2 WHERE id = 42)
{noformat}
*Proposed solution*:
Implement unified bottom-up traversal that in addition to collecting metadata
from ScanNodeS, propagates that up to ModifyNodeS.
> Sql. Partition pruning. Single bottom-up traversal for both queries and DMLs
> ----------------------------------------------------------------------------
>
> Key: IGNITE-28199
> URL: https://issues.apache.org/jira/browse/IGNITE-28199
> Project: Ignite
> Issue Type: Improvement
> Components: sql ai3
> Reporter: Maksim Zhuravkov
> Priority: Major
> Labels: ignite-3
>
> Current implementation of partition pruning (PP) collection algorithm does
> collect metadata for DML statements that reference multiple sources (see
> examples) or have nested queries. This is a limitation is result of current
> implementation of the algorithm that has two separate paths for traversing
> rel node trees - a path for queries (PartitionPruningMetadataExtractor is
> also a visitor) and a path for DMLs(ModifyNodeVisitor). The path for DMLs is
> very conservative and it rejects many valid cases.
> {noformat}
> -- These statements have two sources each - a source for ModifyNode and
> another source for ScanNode, FunctionScan `breaks` traversal that collects
> metadata, so resulting metadata is absent:
> --- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}
> UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
> --- expected: t(DELETE)={id=1}, t(SELECT)={id=1}
> DELETE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
> --- Does not capture metadata because it has a nested query:
> --- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}, t2=1,215 t2={id=42}
> UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM t2 WHERE id = 42)
> {noformat}
> *Proposed solution*:
> Implement unified bottom-up traversal that in addition to collecting metadata
> from ScanNodeS, propagates that up to ModifyNodeS.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)