additional remarks,
I mean if we do not follow the Star schema ( maybe two fact tables to join,
both with dimension columns) ,we only materialize the provided defined
tiles of a cube ( let algorithm option to be false), then a user's join
query will  correctly be transferred to the materialized cube's query ?

On Mon, Jul 10, 2017 at 7:22 PM, weijie tong <[email protected]>
wrote:

> @Julian thanks for your reply.
> Another question is about `Star schema` requirement. Does this
> precondition only affect `Lattice.computeTiles()` method to choose the
> right dimension group to be the candidate Tile ?
>
> On Fri, Jul 7, 2017 at 4:44 AM, Julian Hyde <[email protected]> wrote:
>
>>
>> > On Jun 28, 2017, at 10:58 PM, weijie tong <[email protected]>
>> wrote:
>> >
>> > HI all:
>> >   anyone can explain  the detail of the MonteCarlo algorithm to compute
>> > the tiles of a Lattice?
>> >    It seems that  MonteCarlo algorithm will simulate every possible
>> query
>> > of all kind of  AggregateImpls ,and will choose the lowest cost's ( cost
>> > model determined by the estimateCost() method of LatticeImpl )
>> > AggregateImpl to be the titles.
>>
>> ExhaustiveLatticeAlgorithm will try every possible query (2^n if there
>> are n attributes), whereas MonteCarloAlgorithm tries a set of random
>> queries.
>>
>> Both algorithms are greedy algorithms. Each iteration, they assume that a
>> set of aggregates have been chosen, and choose the best aggregate to add to
>> it by calling getBenefit (which, despite its name, is a cost-benefit
>> ratio). Repeat until there are enough aggregates.
>>
>>
>> >    I also find that the cost benefits of the choose AggregateImpls don't
>> > play any role to the final output AggregateImpl.
>>
>> If you’re referring to the list of CostBenefit objects created at the end
>> of the algorithm; yes, they are just info to put on the screen and prove
>> that the algorithm has done a great job.
>>
>> But you’ll see that getBenefit is called in the inner loop.
>>
>>
>> >    please correct my opinion and show me the mathematical theory of the
>> > MonteCarlo algorithm to  choose the best  aggregates .
>>
>> MonteCarloAlgorithm could be improved by taking into account historic
>> queries, but I think it does a good job for the case where no previous
>> queries are not known.
>>
>> The biggest problem with the algorithm is the amount of time spent
>> gathering statistics. My work on data profiling [1] [2] will speed up
>> getBenefit hugely because it will be able to answer
>> aggregate.estimateRowCount() without executing a query.
>>
>> Julian
>>
>> [1] https://issues.apache.org/jira/browse/CALCITE-1616 <
>> https://issues.apache.org/jira/browse/CALCITE-1616>
>>
>> [2] https://www.slideshare.net/julianhyde/data-profiling-with-
>> apache-calcite <https://www.slideshare.net/ju
>> lianhyde/data-profiling-with-apache-calcite>
>>
>>
>

Reply via email to