Great initiative.

I will also share some comparative performance studies we did at ORNL on
different graph processing engines. Could be useful.

On Fri, Dec 15, 2017 at 14:11 Walaa Eldin Moustafa <[email protected]>
wrote:

> Hi Julian,
>
> Thanks for referencing our Datalog query processing paper [5]. I have been
> thinking about the same idea for a while now too :) I think Calcite is very
> well positioned as a generic query optimizer to add Datalog/recursive query
> support. Also, it makes a lot of sense since it opens a completely new
> dimension for the kind of logic and queries that Calcite can handle,
> including but not limited to graph queries, and that can be immediately
> available to engines talking to Calcite.
>
> To answer your questions, we probably need to add a transitive closure
> operator. This 1988 paper <http://ieeexplore.ieee.org/document/42731/> by
> Rakesh Agrawal proposes the notion of alpha relations, and defines an alpha
> operator on top of them which computes the transitive closure of alpha
> relations. The operator fits well with the rest of Cod's relational algebra
> operators.
>
> For query optimizations, one of the commonly used Datalog optimizations is
> Semi-naive evaluation, where instead of re-evaluating the recursive program
> using all existing facts, only new facts inferred since last iteration are
> used. Datalog optimizations become much more interesting when introducing
> aggregation and negation, and it is still an open research question, but
> there is already some tangible progress. Otherwise, as you mentioned
> transitive closure is repeated joins, so pretty much many of the join
> optimizations apply such as predicate pushdown, and aggregation
> pushdown/pull up.
>
> Regarding the effort, we can always start from basic features and expand
> from there. I have already started working on the parser, AST and logical
> plan builder for basic Datalog without recursion. I am happy to create a
> JIRA ticket to track this effort there.
>
> Thanks,
> Walaa.
>
>
> On Fri, Dec 15, 2017 at 10:26 AM, Julian Hyde <[email protected]> wrote:
>
> > I've been thinking about Datalog front end to Calcite. How much effort
> > would it be? Would there be an audience who would find it useful?
> >
> > The genesis of the idea is talks by Frank McSherry[1] and Vasia
> > Kalavri[2] about graph queries and in particular Timely
> > Dataflow[3][4], and a paper about using Datalog for graph processing
> > [5].
> >
> > A few observations:
> > * Graph queries require repeated (unbounded) joins, and so are beyond
> > SQL:92.
> > * Datalog allows recursion, and can therefore compute things like
> > transitive closure, and is therefore powerful enough for graph
> > queries.
> > * SQL:99 added 'WITH RECURSIVE' so can handle a pretty useful class of
> > queries. However, for a variety of reasons, people tend not to use SQL
> > for querying graph databases.
> > * Datalog is more than just recursive queries. For instance, it is
> > popular with academics analyzing the power/complexity of languages.
> > And it can do deductive queries.
> > * There are different "strengths" of Datalog. Some variants support
> > only linearizable recursion; most variants only support queries whose
> > results are stratified (i.e. can be defined using well-founded
> > induction, necessary when negations are involved).
> > * The "big data" languages (Hadoop, Spark, Flink, even Pig) have all
> > discussed how they can handle graph queries and iterative computation.
> > However they have mainly focused on improvements to their engine and
> > physical algebra, not looked at logical algebra or query language.
> > * If Calcite's algebra could express deductive query / recursive query
> > / iteration, then not only would Datalog be possible, but also SQL's
> > WITH RECURSIVE, and also SPARQL.
> >
> > So, questions on my mind:
> > * What new operator(s) would we add to Calcite's algebra to enable
> > recursive query?
> > * What optimization rules are possible/necessary for graph queries?
> > * How much effort would it be to add a Datalog parser to Calcite and
> > translate to Calcite's algebra?
> >
> > Julian
> >
> > [1] http://www.dataengconf.com/scalability-but-at-what-cost
> >
> > [2] https://qconsf.com/sf2017/speakers/vasia-kalavri
> >
> > [3] https://github.com/frankmcsherry/timely-dataflow
> >
> > [4] http://sigops.org/sosp/sosp13/papers/p439-murray.pdf
> >
> > [5] http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf
> >
>

Reply via email to