I've been thinking about Datalog front end to Calcite. How much effort would it be? Would there be an audience who would find it useful?
The genesis of the idea is talks by Frank McSherry[1] and Vasia Kalavri[2] about graph queries and in particular Timely Dataflow[3][4], and a paper about using Datalog for graph processing [5]. A few observations: * Graph queries require repeated (unbounded) joins, and so are beyond SQL:92. * Datalog allows recursion, and can therefore compute things like transitive closure, and is therefore powerful enough for graph queries. * SQL:99 added 'WITH RECURSIVE' so can handle a pretty useful class of queries. However, for a variety of reasons, people tend not to use SQL for querying graph databases. * Datalog is more than just recursive queries. For instance, it is popular with academics analyzing the power/complexity of languages. And it can do deductive queries. * There are different "strengths" of Datalog. Some variants support only linearizable recursion; most variants only support queries whose results are stratified (i.e. can be defined using well-founded induction, necessary when negations are involved). * The "big data" languages (Hadoop, Spark, Flink, even Pig) have all discussed how they can handle graph queries and iterative computation. However they have mainly focused on improvements to their engine and physical algebra, not looked at logical algebra or query language. * If Calcite's algebra could express deductive query / recursive query / iteration, then not only would Datalog be possible, but also SQL's WITH RECURSIVE, and also SPARQL. So, questions on my mind: * What new operator(s) would we add to Calcite's algebra to enable recursive query? * What optimization rules are possible/necessary for graph queries? * How much effort would it be to add a Datalog parser to Calcite and translate to Calcite's algebra? Julian [1] http://www.dataengconf.com/scalability-but-at-what-cost [2] https://qconsf.com/sf2017/speakers/vasia-kalavri [3] https://github.com/frankmcsherry/timely-dataflow [4] http://sigops.org/sosp/sosp13/papers/p439-murray.pdf [5] http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf
