[ 
https://issues.apache.org/jira/browse/CALCITE-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754293#comment-16754293
 ] 

Julian Hyde edited comment on CALCITE-2812 at 1/28/19 7:35 PM:
---------------------------------------------------------------

For reference, here's what I wrote in my email thread over a year ago:

{quote}I've been thinking about Datalog front end to Calcite. How much effort
would it be? Would there be an audience who would find it useful?

The genesis of the idea is talks by [Frank 
McSherry|http://www.dataengconf.com/scalability-but-at-what-cost] and [Vasia 
Kalavri| https://qconsf.com/sf2017/speakers/vasia-kalavri] about graph queries 
and in particular [Timely|https://github.com/frankmcsherry/timely-dataflow] 
[Dataflow|http://sigops.org/sosp/sosp13/papers/p439-murray.pdf], and a paper 
about using [Datalog for graph 
processing|http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf].

A few observations:
* Graph queries require repeated (unbounded) joins, and so are beyond SQL:92.
* Datalog allows recursion, and can therefore compute things like transitive 
closure, and is therefore powerful enough for graph queries.
* SQL:99 added {{WITH RECURSIVE}} so can handle a pretty useful class of 
queries. However, for a variety of reasons, people tend not to use SQL for 
querying graph databases.
* Datalog is more than just recursive queries. For instance, it is popular with 
academics analyzing the power/complexity of languages. And it can do deductive 
queries.
* There are different "strengths" of Datalog. Some variants support only 
linearizable recursion; most variants only support queries whose results are 
stratified (i.e. can be defined using well-founded induction, necessary when 
negations are involved).
* The "big data" languages (Hadoop, Spark, Flink, even Pig) have all discussed 
how they can handle graph queries and iterative computation. However they have 
mainly focused on improvements to their engine and physical algebra, not looked 
at logical algebra or query language.
* If Calcite's algebra could express deductive query / recursive query / 
iteration, then not only would Datalog be possible, but also SQL's {{WITH 
RECURSIVE}}, and also SPARQL.

So, questions on my mind:
* What new operator(s) would we add to Calcite's algebra to enable recursive 
query?
* What optimization rules are possible/necessary for graph queries?
* How much effort would it be to add a Datalog parser to Calcite and translate 
to Calcite's algebra?{quote}

[~rubenql]'s proposed RecursiveUnion and DeltaTableScan operators seem to 
handle the case of recursive/union queries and are welcome; but I'm still 
looking for the (meta) operator "Iterate", which can handle a wider class of 
deductive query.



was (Author: julianhyde):
For reference, here's what I wrote in my email thread over a year ago:

{quote}I've been thinking about Datalog front end to Calcite. How much effort
would it be? Would there be an audience who would find it useful?

The genesis of the idea is talks by [Frank 
McSherry|http://www.dataengconf.com/scalability-but-at-what-cost] and [Vasia 
Kalavri| https://qconsf.com/sf2017/speakers/vasia-kalavri] about graph queries 
and in particular [Timely|https://github.com/frankmcsherry/timely-dataflow] 
[http://sigops.org/sosp/sosp13/papers/p439-murray.pdf|Dataflow], and a paper 
about using [Datalog for graph 
processing|http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf].

A few observations:
* Graph queries require repeated (unbounded) joins, and so are beyond SQL:92.
* Datalog allows recursion, and can therefore compute things like transitive 
closure, and is therefore powerful enough for graph queries.
* SQL:99 added {{WITH RECURSIVE}} so can handle a pretty useful class of 
queries. However, for a variety of reasons, people tend not to use SQL for 
querying graph databases.
* Datalog is more than just recursive queries. For instance, it is popular with 
academics analyzing the power/complexity of languages. And it can do deductive 
queries.
* There are different "strengths" of Datalog. Some variants support only 
linearizable recursion; most variants only support queries whose results are 
stratified (i.e. can be defined using well-founded induction, necessary when 
negations are involved).
* The "big data" languages (Hadoop, Spark, Flink, even Pig) have all discussed 
how they can handle graph queries and iterative computation. However they have 
mainly focused on improvements to their engine and physical algebra, not looked 
at logical algebra or query language.
* If Calcite's algebra could express deductive query / recursive query / 
iteration, then not only would Datalog be possible, but also SQL's {{WITH 
RECURSIVE}}, and also SPARQL.

So, questions on my mind:
* What new operator(s) would we add to Calcite's algebra to enable recursive 
query?
* What optimization rules are possible/necessary for graph queries?
* How much effort would it be to add a Datalog parser to Calcite and translate 
to Calcite's algebra?{quote}

[~rubenql]'s proposed RecursiveUnion and DeltaTableScan operators seem to 
handle the case of recursive/union queries and are welcome; but I'm still 
looking for the (meta) operator "Iterate", which can handle a wider class of 
deductive query.


> Add algebraic operators to allow expressing recursive queries
> -------------------------------------------------------------
>
>                 Key: CALCITE-2812
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2812
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.18.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Julian Hyde
>            Priority: Major
>             Fix For: next
>
>
> In order to parse, optimize, and execute, recursive queries, expressed in 
> SQL, datalog, SPARQL, or other high level language we need first to be able 
> to represent recursive queries in relational algebra.
> The subject has been previously discussed in the dev list (see thread with 
> title [Recursive query, graph query, 
> Datalog|http://mail-archives.apache.org/mod_mbox/calcite-dev/201712.mbox/%3CCAPSgeESFyih9_hf9=uMWFN00BCR7sjf0T+FRY2=ary3ygm1...@mail.gmail.com%3E])
>  where various ideas  and optimizations were proposed. 
> In this issue, we attempt to address only the algebraic part providing the 
> following:
> # logical operator(s) for expressing recursion;
> # naive physical operator(s) for the Enumerable convention;
> # ability to create a recusive plan using the RelBuilder. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to