Hello all,

Let me introduce myself again. I’m Riccardo Tommasini, Politecnico di Milano.
I do research on Stream Reasoning, graph stream processing.
In particular, RDF stream processing and i work with temporal datalog.

This is a quite exciting news.

Going technical, a datalog frontend that stays in the relational algebra world 
would be quite easy as java datalog parser exists and, for the non recursive 
fragment, is totally implementable within calcite.
Actually, University of Bolzano is already working on such a process using 
calcite. I cced Guohui Xiao, a postdoc who might be interested.

Regarding graph specific queries and SPARQL. I’m quite found of the subject and 
i think there might be some issues to deal with the query planning. I’m not 
super expert of calcite though but i would be happy to help.

A final remark on efficiency. All the mentioned features are very interesting 
but not computationally nice. Idk what’s calcite position on this, but in a big 
data community i think we should be careful. SPARQL did some crazy stuff and is 
still paying them (see gutierrez papers)

RT

On 15 Dec 2017, 19:26 +0100, Julian Hyde <[email protected]>, wrote:
I've been thinking about Datalog front end to Calcite. How much effort
would it be? Would there be an audience who would find it useful?

The genesis of the idea is talks by Frank McSherry[1] and Vasia
Kalavri[2] about graph queries and in particular Timely
Dataflow[3][4], and a paper about using Datalog for graph processing
[5].

A few observations:
* Graph queries require repeated (unbounded) joins, and so are beyond SQL:92.
* Datalog allows recursion, and can therefore compute things like
transitive closure, and is therefore powerful enough for graph
queries.
* SQL:99 added 'WITH RECURSIVE' so can handle a pretty useful class of
queries. However, for a variety of reasons, people tend not to use SQL
for querying graph databases.
* Datalog is more than just recursive queries. For instance, it is
popular with academics analyzing the power/complexity of languages.
And it can do deductive queries.
* There are different "strengths" of Datalog. Some variants support
only linearizable recursion; most variants only support queries whose
results are stratified (i.e. can be defined using well-founded
induction, necessary when negations are involved).
* The "big data" languages (Hadoop, Spark, Flink, even Pig) have all
discussed how they can handle graph queries and iterative computation.
However they have mainly focused on improvements to their engine and
physical algebra, not looked at logical algebra or query language.
* If Calcite's algebra could express deductive query / recursive query
/ iteration, then not only would Datalog be possible, but also SQL's
WITH RECURSIVE, and also SPARQL.

So, questions on my mind:
* What new operator(s) would we add to Calcite's algebra to enable
recursive query?
* What optimization rules are possible/necessary for graph queries?
* How much effort would it be to add a Datalog parser to Calcite and
translate to Calcite's algebra?

Julian

[1] http://www.dataengconf.com/scalability-but-at-what-cost

[2] https://qconsf.com/sf2017/speakers/vasia-kalavri

[3] https://github.com/frankmcsherry/timely-dataflow

[4] http://sigops.org/sosp/sosp13/papers/p439-murray.pdf

[5] http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf

Reply via email to