There are many possible analyzers, but Lattice Suggester is one that I am 
working on and is relatively mature. It looks at lots of queries and builds 
lattices (star schemas with measures) from those queries. It finds commonality 
by “growing” lattices - adding measures, adding derived expressions, adding 
many-to-one joins. Lattice Suggester takes a set of SQL query strings, then it 
parses them (to SqlNode), validates, and converts to relational algebra 
(RelNode). Then it looks for patterns in the relational algebra. Working at the 
algebra level as opposed to the SQL parse tree is a net benefit, but some 
things (e.g. figuring out the original column alias for an expression) are a 
bit more difficult.

Lattice suggester is under development in my 
https://github.com/julianhyde/calcite/tree/1870-lattice-suggester 
<https://github.com/julianhyde/calcite/tree/1870-lattice-suggester> branch. The 
tests pass, and I hope to have it merged into master in the next month or two.

I agree with Devjyoti that it’s difficult to gather together all possible 
analyses in one tool. Lattice Suggester is a good base for analyses that model 
queries as a filter/project/aggregate of a pre-joined star schema — a very 
common model in BI; for example, it could model which combinations of columns 
are commonly used as filters.

For analyses that are not tied to star schemas, feel free to create new tools. 
The tools would benefit from collaborative development, and I think that 
Calcite would be a good home for them.

Julian




> On Jul 25, 2018, at 10:28 PM, Devjyoti Patra <[email protected]> wrote:
> 
> Hi Zheng,
> 
> At Qubole, we are building something very similar to what you are looking
> for. And from experience, I can tell you that it is a lot easy to build it
> than what one may think.
> We use Calcite parser to parse the SQL into Sqlnode and then use different
> tree visitors to extract query attributes like  tables, filter columns,
> joins, subqueries etc.,
> 
> Our approach is very similar to Uber's QueryParser project (
> https://github.com/uber/queryparser ), but we go deeper in our analysis of
> finding queries that are semantically similar to some canonicalized form.
> If you intend to begin from scratch, I can give you some pointers to get
> started.
> 
> Thanks,
> Devjyoti
> 
> 
> On Thu, Jul 26, 2018 at 9:37 AM, Zheng Shao <[email protected]> wrote:
> 
>> Hi,
>> 
>> We are thinking about starting a project to analyze huge number of SQL
>> queries (think millions) to identify common patterns:
>> * Common sub queries
>> * Common filtering conditions (columns) for a table
>> * Common join keys for table pairs
>> 
>> Are there any existing projects on that direction using Calcite?  Would
>> love to leverage instead of building from scratch.
>> 
>> Zheng
>> 

Reply via email to