I think there are divergent definitions of “similar”, and it helps if you state 
why you want to detect similar queries.

For example, by one definition, queries are similar if they are the same modulo 
substituting constants. For these queries,  you can share plans if you hoist 
literals[1].

Other queries are similar if they use the same underlying tables. (The goal 
here might be security or compliance.)

Another definition of similarity is “can I satisfy query A using the results of 
query B”? Materialized view rewrite is interested in this.

For some kinds of similarity, I would work on the AST (i.e. SqlNode). For 
other, deeper, kinds of similarity I would recommend working at the algebra 
level (i.e. RelNode).

Then there is equivalence sets vs. similarity. I think you’re talking about 
equivalence sets. Q1 and Q2 are either in the same equivalence set or they are 
not. If they are, then f(Q1) = f(Q2), where f is some function, say a hash 
function or canonization function. Other kinds of similarity (e.g. [2]) might 
give a “distance” between queries, and queries are deemed equivalent iff that 
distance is 0.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-963

[2] https://en.wikipedia.org/wiki/Cosine_similarity 
<https://en.wikipedia.org/wiki/Cosine_similarity>


> On Nov 30, 2015, at 11:08 AM, Nick Dimiduk <[email protected]> wrote:
> 
> Hello.
> 
> I have a pile of SQL queries. Is it possible to use Calcite to group these
> queries into "similar" buckets? Perhaps via some hash function on a parsed
> query? I see some databases' query planners provide this kind of
> functionality, but mine apparently does not.
> 
> Thanks,
> Nick

Reply via email to