Hey Mark!

Sounds like an interesting thing!
I don't think its being explored right now... :)
I think you might start with the following approaches:

* you can get access to the PlanMapper which usually has a full list of 
operator stats/signatures
  these signatures are designed to be comparable; so you can build a counting 
set from them
  You might start with this as a hook; for example purposes you can start with 
this:
  
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/RuntimeStatsPersistenceCheckerHook.java
  * pros:
    * I think you can get started with this more easily
    * you also have access to some runtime infos about previous execution
    * there are known memory consumption estimates how much a signature 
typically use (100bytes/OP)
  * cons:
    * RelNodes and other things are not (yet) connected to the mapper; so it's 
harder to go back to calcite level
    * You have to reconstruct the proposed materizalized view suggestion from 
the optree

* the other way I could think of is matching the RelNodes of different plans to 
eachother;
  I don't know exactly how to do this - but the concept would be to keep all 
executed plan's reltree; and compare those.
  * pros:
    * probably recommended materialized view suggestions can be generated using 
rel2sql
    * more closer to the materialized view application logic..
    * this could be probably done by using only calcite
  * cons:
    * memory need is unknown - keeping reltrees might eat up memory..
    * I don't know if these nodes could be used in a set or not - and how much 
the comparision costs.


I feel that the decision whether to introduce a materialized view or not should 
also take runtime informations into account.

I hope you good luck with it :)

cheers,
Zoltan


On 12/2/18 10:24 AM, mark pasterkamp wrote:
Dear all,

For research purposes I have been looking into finding potential materialized 
views based on a given set of queries. For instance, if you have 2 queries 
joining table A and B, perhaps it would be advantageous to materialize said 
join and utilise Calcite's ability to rewrite queries to use this newly formed 
materialized view.

I have been looking these past few weeks into the Hive source code but I must 
admit, I am a little overwhelmed with the code base. I was wondering if someone 
could perhaps point me to a few classes of interest to get me started.

I have posted a similar question by accident on the user list. I'm sorry if it 
seems like I am spamming the mailing lists because of this.


With kind regards,

Mark

Reply via email to