Hi Julian,

Thanks for referencing our Datalog query processing paper [5]. I have been
thinking about the same idea for a while now too :) I think Calcite is very
well positioned as a generic query optimizer to add Datalog/recursive query
support. Also, it makes a lot of sense since it opens a completely new
dimension for the kind of logic and queries that Calcite can handle,
including but not limited to graph queries, and that can be immediately
available to engines talking to Calcite.

To answer your questions, we probably need to add a transitive closure
operator. This 1988 paper <http://ieeexplore.ieee.org/document/42731/> by
Rakesh Agrawal proposes the notion of alpha relations, and defines an alpha
operator on top of them which computes the transitive closure of alpha
relations. The operator fits well with the rest of Cod's relational algebra
operators.

For query optimizations, one of the commonly used Datalog optimizations is
Semi-naive evaluation, where instead of re-evaluating the recursive program
using all existing facts, only new facts inferred since last iteration are
used. Datalog optimizations become much more interesting when introducing
aggregation and negation, and it is still an open research question, but
there is already some tangible progress. Otherwise, as you mentioned
transitive closure is repeated joins, so pretty much many of the join
optimizations apply such as predicate pushdown, and aggregation
pushdown/pull up.

Regarding the effort, we can always start from basic features and expand
from there. I have already started working on the parser, AST and logical
plan builder for basic Datalog without recursion. I am happy to create a
JIRA ticket to track this effort there.

Thanks,
Walaa.


On Fri, Dec 15, 2017 at 10:26 AM, Julian Hyde <[email protected]> wrote:

> I've been thinking about Datalog front end to Calcite. How much effort
> would it be? Would there be an audience who would find it useful?
>
> The genesis of the idea is talks by Frank McSherry[1] and Vasia
> Kalavri[2] about graph queries and in particular Timely
> Dataflow[3][4], and a paper about using Datalog for graph processing
> [5].
>
> A few observations:
> * Graph queries require repeated (unbounded) joins, and so are beyond
> SQL:92.
> * Datalog allows recursion, and can therefore compute things like
> transitive closure, and is therefore powerful enough for graph
> queries.
> * SQL:99 added 'WITH RECURSIVE' so can handle a pretty useful class of
> queries. However, for a variety of reasons, people tend not to use SQL
> for querying graph databases.
> * Datalog is more than just recursive queries. For instance, it is
> popular with academics analyzing the power/complexity of languages.
> And it can do deductive queries.
> * There are different "strengths" of Datalog. Some variants support
> only linearizable recursion; most variants only support queries whose
> results are stratified (i.e. can be defined using well-founded
> induction, necessary when negations are involved).
> * The "big data" languages (Hadoop, Spark, Flink, even Pig) have all
> discussed how they can handle graph queries and iterative computation.
> However they have mainly focused on improvements to their engine and
> physical algebra, not looked at logical algebra or query language.
> * If Calcite's algebra could express deductive query / recursive query
> / iteration, then not only would Datalog be possible, but also SQL's
> WITH RECURSIVE, and also SPARQL.
>
> So, questions on my mind:
> * What new operator(s) would we add to Calcite's algebra to enable
> recursive query?
> * What optimization rules are possible/necessary for graph queries?
> * How much effort would it be to add a Datalog parser to Calcite and
> translate to Calcite's algebra?
>
> Julian
>
> [1] http://www.dataengconf.com/scalability-but-at-what-cost
>
> [2] https://qconsf.com/sf2017/speakers/vasia-kalavri
>
> [3] https://github.com/frankmcsherry/timely-dataflow
>
> [4] http://sigops.org/sosp/sosp13/papers/p439-murray.pdf
>
> [5] http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf
>

Reply via email to