Re: Questions about MonteCarloAlgorithm to compute tiles of Lattice

Julian Hyde Thu, 06 Jul 2017 13:45:06 -0700

> On Jun 28, 2017, at 10:58 PM, weijie tong <[email protected]> wrote:
> 
> HI all:
>   anyone can explain  the detail of the MonteCarlo algorithm to compute
> the tiles of a Lattice?
>    It seems that  MonteCarlo algorithm will simulate every possible query
> of all kind of  AggregateImpls ,and will choose the lowest cost's ( cost
> model determined by the estimateCost() method of LatticeImpl )
> AggregateImpl to be the titles.


ExhaustiveLatticeAlgorithm will try every possible query (2^n if there are n 
attributes), whereas MonteCarloAlgorithm tries a set of random queries.

Both algorithms are greedy algorithms. Each iteration, they assume that a set 
of aggregates have been chosen, and choose the best aggregate to add to it by 
calling getBenefit (which, despite its name, is a cost-benefit ratio). Repeat 
until there are enough aggregates.

 
>    I also find that the cost benefits of the choose AggregateImpls don't
> play any role to the final output AggregateImpl.

If you’re referring to the list of CostBenefit objects created at the end of 
the algorithm; yes, they are just info to put on the screen and prove that the 
algorithm has done a great job.

But you’ll see that getBenefit is called in the inner loop.


>    please correct my opinion and show me the mathematical theory of the
> MonteCarlo algorithm to  choose the best  aggregates .

MonteCarloAlgorithm could be improved by taking into account historic queries, 
but I think it does a good job for the case where no previous queries are not 
known.

The biggest problem with the algorithm is the amount of time spent gathering 
statistics. My work on data profiling [1] [2] will speed up getBenefit hugely 
because it will be able to answer aggregate.estimateRowCount() without 
executing a query.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-1616 
<https://issues.apache.org/jira/browse/CALCITE-1616>

[2] https://www.slideshare.net/julianhyde/data-profiling-with-apache-calcite 
<https://www.slideshare.net/julianhyde/data-profiling-with-apache-calcite>

Re: Questions about MonteCarloAlgorithm to compute tiles of Lattice

Reply via email to