additional remarks, I mean if we do not follow the Star schema ( maybe two fact tables to join, both with dimension columns) ,we only materialize the provided defined tiles of a cube ( let algorithm option to be false), then a user's join query will correctly be transferred to the materialized cube's query ?
On Mon, Jul 10, 2017 at 7:22 PM, weijie tong <[email protected]> wrote: > @Julian thanks for your reply. > Another question is about `Star schema` requirement. Does this > precondition only affect `Lattice.computeTiles()` method to choose the > right dimension group to be the candidate Tile ? > > On Fri, Jul 7, 2017 at 4:44 AM, Julian Hyde <[email protected]> wrote: > >> >> > On Jun 28, 2017, at 10:58 PM, weijie tong <[email protected]> >> wrote: >> > >> > HI all: >> > anyone can explain the detail of the MonteCarlo algorithm to compute >> > the tiles of a Lattice? >> > It seems that MonteCarlo algorithm will simulate every possible >> query >> > of all kind of AggregateImpls ,and will choose the lowest cost's ( cost >> > model determined by the estimateCost() method of LatticeImpl ) >> > AggregateImpl to be the titles. >> >> ExhaustiveLatticeAlgorithm will try every possible query (2^n if there >> are n attributes), whereas MonteCarloAlgorithm tries a set of random >> queries. >> >> Both algorithms are greedy algorithms. Each iteration, they assume that a >> set of aggregates have been chosen, and choose the best aggregate to add to >> it by calling getBenefit (which, despite its name, is a cost-benefit >> ratio). Repeat until there are enough aggregates. >> >> >> > I also find that the cost benefits of the choose AggregateImpls don't >> > play any role to the final output AggregateImpl. >> >> If you’re referring to the list of CostBenefit objects created at the end >> of the algorithm; yes, they are just info to put on the screen and prove >> that the algorithm has done a great job. >> >> But you’ll see that getBenefit is called in the inner loop. >> >> >> > please correct my opinion and show me the mathematical theory of the >> > MonteCarlo algorithm to choose the best aggregates . >> >> MonteCarloAlgorithm could be improved by taking into account historic >> queries, but I think it does a good job for the case where no previous >> queries are not known. >> >> The biggest problem with the algorithm is the amount of time spent >> gathering statistics. My work on data profiling [1] [2] will speed up >> getBenefit hugely because it will be able to answer >> aggregate.estimateRowCount() without executing a query. >> >> Julian >> >> [1] https://issues.apache.org/jira/browse/CALCITE-1616 < >> https://issues.apache.org/jira/browse/CALCITE-1616> >> >> [2] https://www.slideshare.net/julianhyde/data-profiling-with- >> apache-calcite <https://www.slideshare.net/ju >> lianhyde/data-profiling-with-apache-calcite> >> >> >
