Yes, I think you idea of an equivalence set is what I'm looking for. On Mon, Nov 30, 2015 at 11:22 AM, Julian Hyde <[email protected]> wrote:
> I think there are divergent definitions of “similar”, and it helps if you > state why you want to detect similar queries. > > For example, by one definition, queries are similar if they are the same > modulo substituting constants. For these queries, you can share plans if > you hoist literals[1]. > > Other queries are similar if they use the same underlying tables. (The > goal here might be security or compliance.) > > Another definition of similarity is “can I satisfy query A using the > results of query B”? Materialized view rewrite is interested in this. > > For some kinds of similarity, I would work on the AST (i.e. SqlNode). For > other, deeper, kinds of similarity I would recommend working at the algebra > level (i.e. RelNode). > > Then there is equivalence sets vs. similarity. I think you’re talking > about equivalence sets. Q1 and Q2 are either in the same equivalence set or > they are not. If they are, then f(Q1) = f(Q2), where f is some function, > say a hash function or canonization function. Other kinds of similarity > (e.g. [2]) might give a “distance” between queries, and queries are deemed > equivalent iff that distance is 0. > > Julian > > [1] https://issues.apache.org/jira/browse/CALCITE-963 > > [2] https://en.wikipedia.org/wiki/Cosine_similarity < > https://en.wikipedia.org/wiki/Cosine_similarity> > > > > On Nov 30, 2015, at 11:08 AM, Nick Dimiduk <[email protected]> wrote: > > > > Hello. > > > > I have a pile of SQL queries. Is it possible to use Calcite to group > these > > queries into "similar" buckets? Perhaps via some hash function on a > parsed > > query? I see some databases' query planners provide this kind of > > functionality, but mine apparently does not. > > > > Thanks, > > Nick > >
