@Julian thanks for your reply. Another question is about `Star schema` requirement. Does this precondition only affect `Lattice.computeTiles()` method to choose the right dimension group to be the candidate Tile ?
On Fri, Jul 7, 2017 at 4:44 AM, Julian Hyde <[email protected]> wrote: > > > On Jun 28, 2017, at 10:58 PM, weijie tong <[email protected]> > wrote: > > > > HI all: > > anyone can explain the detail of the MonteCarlo algorithm to compute > > the tiles of a Lattice? > > It seems that MonteCarlo algorithm will simulate every possible query > > of all kind of AggregateImpls ,and will choose the lowest cost's ( cost > > model determined by the estimateCost() method of LatticeImpl ) > > AggregateImpl to be the titles. > > ExhaustiveLatticeAlgorithm will try every possible query (2^n if there are > n attributes), whereas MonteCarloAlgorithm tries a set of random queries. > > Both algorithms are greedy algorithms. Each iteration, they assume that a > set of aggregates have been chosen, and choose the best aggregate to add to > it by calling getBenefit (which, despite its name, is a cost-benefit > ratio). Repeat until there are enough aggregates. > > > > I also find that the cost benefits of the choose AggregateImpls don't > > play any role to the final output AggregateImpl. > > If you’re referring to the list of CostBenefit objects created at the end > of the algorithm; yes, they are just info to put on the screen and prove > that the algorithm has done a great job. > > But you’ll see that getBenefit is called in the inner loop. > > > > please correct my opinion and show me the mathematical theory of the > > MonteCarlo algorithm to choose the best aggregates . > > MonteCarloAlgorithm could be improved by taking into account historic > queries, but I think it does a good job for the case where no previous > queries are not known. > > The biggest problem with the algorithm is the amount of time spent > gathering statistics. My work on data profiling [1] [2] will speed up > getBenefit hugely because it will be able to answer > aggregate.estimateRowCount() without executing a query. > > Julian > > [1] https://issues.apache.org/jira/browse/CALCITE-1616 < > https://issues.apache.org/jira/browse/CALCITE-1616> > > [2] https://www.slideshare.net/julianhyde/data-profiling- > with-apache-calcite <https://www.slideshare.net/julianhyde/data-profiling- > with-apache-calcite> > >
