Hi,
This post concerns a strange (imho) behaviour of recursive WITH update
query as far as execution time is concerned.
I have created (in an "In memory" DB) a table that stores a set of nodes
arranged in a tree structure (actually the skeleton of the contents of a
Windows folder):
create table GRAPH(Parent int, Child int, Level int);
Column "Level" indicates the depth of the node in the tree. The root
node has Level 0, its children Level 1, and so on.
2690 nodes have been inserted, with "Level" initialized to null:
insert into GRAPH(Parent,Child) values (null,1); -- root node (no
parent)
insert into GRAPH(Parent,Child) values (1,9); -- child of the
root node
insert into GRAPH(Parent,Child) values (9,10); -- grand-child of
the root node
insert into GRAPH(Parent,Child) values (9,11);
insert into GRAPH(Parent,Child) values (9,12);
insert into GRAPH(Parent,Child) values (9,13);
etc.
Considering the size of the table, no indexes have been created.
The distribution of nodes among the levels is fairly "normal":
+-------+--------+
| Level | Number |
+-------+--------+
| 0 | 1 |
| 1 | 47 |
| 2 | 215 |
| 3 | 638 |
| 4 | 1010 |
| 5 | 729 |
| 6 | 50 |
+-------+--------+
Now, I would like to compute the "Level" value of all nodes from their
position in the tree. This task immediately suggests a recursive WITH
update:
with recursive
HIERARCHY(FromID,ToID,Level) as
(
select Parent, Child, '0'
from GRAPH where Parent is null
union all
select H.ToID, G.Child, H.Level + 1
from HIERARCHY H, GRAPH G
where H.ToID = G.Parent
)
update GRAPH
set Level = (select Level from HIERARCHYwhere ToID = GRAPH.Child);
When this query is executed by SQLite 3.16.2, the timer reports an
execution time of 5.522 s. Adding an index on "Parent" and one on
"Child" just makes things worse (6.381 s.)
I find this figures quite high, so that I try an iterative technique,
which is likely to be close to the execution strategy of the WITH
statement (https://www.sqlite.org/lang_with.html). I translate it for
the CLI shell as follows:
update GRAPH set Level = 0 where Parent is null;
update GRAPH set Level = 1
where Parent in (select Child from GRAPH where Level = 0);
update GRAPH set Level = 2
where Parent in (select Child from GRAPH where Level = 1);
update GRAPH set Level = 3
where Parent in (select Child from GRAPH where Level = 2);
update GRAPH set Level = 4
where Parent in (select Child from GRAPH where Level = 3);
update GRAPH set Level = 5
where Parent in (select Child from GRAPH where Level = 4);
update GRAPH set Level = 6
where Parent in (select Child from GRAPH where Level = 5);
For this script, I get an execution time of 0.015 s., i.e., nearly 370
times less!
Is there something wrong in my queries? Or is there an optimization
trick for WITH queries by which one could approach the performance of
the iterative version?
The scripts are available here:
https://www.dropbox.com/s/23t4ycftlk0doy1/GRAPH-performance.zip?dl=0
Thanks for any advice
Jean-Luc Hainaut
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users