Recent SNAPSHOT builds of ARQ have started to include a pair of new optimizers
(TransformFilterImplicitJoin and TransformImplicitLeftJoin) which aim to
address performance deficiencies in a certain style of query. In our testing
we have seen impressive performance improvements for these kinds of queries
(1-2 orders of magnitude)
These optimizations target queries of the following general forms:
1 – Implicit Join – Queries where a FILTER applies a ?x = ?y or SAMETERM(?x,
?y) constraint e.g.
SELECT *
WHERE
{
?x ?p1 ?o1 .
?y ?p2 ?o2 .
FILTER(?x = ?y)
}
2 – Implicit Left Join – Queries where a FILTER applies a ?x = ?y or
SAMETERM(?x, ?y) over a left join e.g.
SELECT *
WHERE
{
?x ?p1 ?o1 .
OPTIONAL
{
?y ?p2 ?o2 .
}
FILTER(?x = ?y)
}
In both cases the optimization is applied only when considered safe and the
optimizers are conservative and will not apply the optimizations when they
would be unsafe.
While we have have already added many test cases to this effect we would
appreciate if users who have workloads with these style of queries could run
the latest SNAPSHOT against their queries to check that we are not applying the
optimization in cases which are unsafe or have actually introduced performance
regressions (e.g. due to the new optimizations blocking other optimizations).
Reports of any queries that exhibit these issues would be appreciated, reports
of improved performance would also provide useful validation of this work.
The work is ongoing and there are further cases we can optimize that we are not
yet doing so expect further improvements in this area.
Thanks,
Rob