Great initiative. I will also share some comparative performance studies we did at ORNL on different graph processing engines. Could be useful.
On Fri, Dec 15, 2017 at 14:11 Walaa Eldin Moustafa <[email protected]> wrote: > Hi Julian, > > Thanks for referencing our Datalog query processing paper [5]. I have been > thinking about the same idea for a while now too :) I think Calcite is very > well positioned as a generic query optimizer to add Datalog/recursive query > support. Also, it makes a lot of sense since it opens a completely new > dimension for the kind of logic and queries that Calcite can handle, > including but not limited to graph queries, and that can be immediately > available to engines talking to Calcite. > > To answer your questions, we probably need to add a transitive closure > operator. This 1988 paper <http://ieeexplore.ieee.org/document/42731/> by > Rakesh Agrawal proposes the notion of alpha relations, and defines an alpha > operator on top of them which computes the transitive closure of alpha > relations. The operator fits well with the rest of Cod's relational algebra > operators. > > For query optimizations, one of the commonly used Datalog optimizations is > Semi-naive evaluation, where instead of re-evaluating the recursive program > using all existing facts, only new facts inferred since last iteration are > used. Datalog optimizations become much more interesting when introducing > aggregation and negation, and it is still an open research question, but > there is already some tangible progress. Otherwise, as you mentioned > transitive closure is repeated joins, so pretty much many of the join > optimizations apply such as predicate pushdown, and aggregation > pushdown/pull up. > > Regarding the effort, we can always start from basic features and expand > from there. I have already started working on the parser, AST and logical > plan builder for basic Datalog without recursion. I am happy to create a > JIRA ticket to track this effort there. > > Thanks, > Walaa. > > > On Fri, Dec 15, 2017 at 10:26 AM, Julian Hyde <[email protected]> wrote: > > > I've been thinking about Datalog front end to Calcite. How much effort > > would it be? Would there be an audience who would find it useful? > > > > The genesis of the idea is talks by Frank McSherry[1] and Vasia > > Kalavri[2] about graph queries and in particular Timely > > Dataflow[3][4], and a paper about using Datalog for graph processing > > [5]. > > > > A few observations: > > * Graph queries require repeated (unbounded) joins, and so are beyond > > SQL:92. > > * Datalog allows recursion, and can therefore compute things like > > transitive closure, and is therefore powerful enough for graph > > queries. > > * SQL:99 added 'WITH RECURSIVE' so can handle a pretty useful class of > > queries. However, for a variety of reasons, people tend not to use SQL > > for querying graph databases. > > * Datalog is more than just recursive queries. For instance, it is > > popular with academics analyzing the power/complexity of languages. > > And it can do deductive queries. > > * There are different "strengths" of Datalog. Some variants support > > only linearizable recursion; most variants only support queries whose > > results are stratified (i.e. can be defined using well-founded > > induction, necessary when negations are involved). > > * The "big data" languages (Hadoop, Spark, Flink, even Pig) have all > > discussed how they can handle graph queries and iterative computation. > > However they have mainly focused on improvements to their engine and > > physical algebra, not looked at logical algebra or query language. > > * If Calcite's algebra could express deductive query / recursive query > > / iteration, then not only would Datalog be possible, but also SQL's > > WITH RECURSIVE, and also SPARQL. > > > > So, questions on my mind: > > * What new operator(s) would we add to Calcite's algebra to enable > > recursive query? > > * What optimization rules are possible/necessary for graph queries? > > * How much effort would it be to add a Datalog parser to Calcite and > > translate to Calcite's algebra? > > > > Julian > > > > [1] http://www.dataengconf.com/scalability-but-at-what-cost > > > > [2] https://qconsf.com/sf2017/speakers/vasia-kalavri > > > > [3] https://github.com/frankmcsherry/timely-dataflow > > > > [4] http://sigops.org/sosp/sosp13/papers/p439-murray.pdf > > > > [5] http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf > > >
