@Julian thanks for your reply.
Another question is about `Star schema` requirement. Does this precondition
only affect `Lattice.computeTiles()` method to choose the right dimension
group to be the candidate Tile ?

On Fri, Jul 7, 2017 at 4:44 AM, Julian Hyde <[email protected]> wrote:

>
> > On Jun 28, 2017, at 10:58 PM, weijie tong <[email protected]>
> wrote:
> >
> > HI all:
> >   anyone can explain  the detail of the MonteCarlo algorithm to compute
> > the tiles of a Lattice?
> >    It seems that  MonteCarlo algorithm will simulate every possible query
> > of all kind of  AggregateImpls ,and will choose the lowest cost's ( cost
> > model determined by the estimateCost() method of LatticeImpl )
> > AggregateImpl to be the titles.
>
> ExhaustiveLatticeAlgorithm will try every possible query (2^n if there are
> n attributes), whereas MonteCarloAlgorithm tries a set of random queries.
>
> Both algorithms are greedy algorithms. Each iteration, they assume that a
> set of aggregates have been chosen, and choose the best aggregate to add to
> it by calling getBenefit (which, despite its name, is a cost-benefit
> ratio). Repeat until there are enough aggregates.
>
>
> >    I also find that the cost benefits of the choose AggregateImpls don't
> > play any role to the final output AggregateImpl.
>
> If you’re referring to the list of CostBenefit objects created at the end
> of the algorithm; yes, they are just info to put on the screen and prove
> that the algorithm has done a great job.
>
> But you’ll see that getBenefit is called in the inner loop.
>
>
> >    please correct my opinion and show me the mathematical theory of the
> > MonteCarlo algorithm to  choose the best  aggregates .
>
> MonteCarloAlgorithm could be improved by taking into account historic
> queries, but I think it does a good job for the case where no previous
> queries are not known.
>
> The biggest problem with the algorithm is the amount of time spent
> gathering statistics. My work on data profiling [1] [2] will speed up
> getBenefit hugely because it will be able to answer
> aggregate.estimateRowCount() without executing a query.
>
> Julian
>
> [1] https://issues.apache.org/jira/browse/CALCITE-1616 <
> https://issues.apache.org/jira/browse/CALCITE-1616>
>
> [2] https://www.slideshare.net/julianhyde/data-profiling-
> with-apache-calcite <https://www.slideshare.net/julianhyde/data-profiling-
> with-apache-calcite>
>
>

Reply via email to