Hey Mark! Sounds like an interesting thing! I don't think its being explored right now... :) I think you might start with the following approaches:
* you can get access to the PlanMapper which usually has a full list of operator stats/signatures these signatures are designed to be comparable; so you can build a counting set from them You might start with this as a hook; for example purposes you can start with this: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/RuntimeStatsPersistenceCheckerHook.java * pros: * I think you can get started with this more easily * you also have access to some runtime infos about previous execution * there are known memory consumption estimates how much a signature typically use (100bytes/OP) * cons: * RelNodes and other things are not (yet) connected to the mapper; so it's harder to go back to calcite level * You have to reconstruct the proposed materizalized view suggestion from the optree * the other way I could think of is matching the RelNodes of different plans to eachother; I don't know exactly how to do this - but the concept would be to keep all executed plan's reltree; and compare those. * pros: * probably recommended materialized view suggestions can be generated using rel2sql * more closer to the materialized view application logic.. * this could be probably done by using only calcite * cons: * memory need is unknown - keeping reltrees might eat up memory.. * I don't know if these nodes could be used in a set or not - and how much the comparision costs. I feel that the decision whether to introduce a materialized view or not should also take runtime informations into account. I hope you good luck with it :) cheers, Zoltan On 12/2/18 10:24 AM, mark pasterkamp wrote:
Dear all, For research purposes I have been looking into finding potential materialized views based on a given set of queries. For instance, if you have 2 queries joining table A and B, perhaps it would be advantageous to materialize said join and utilise Calcite's ability to rewrite queries to use this newly formed materialized view. I have been looking these past few weeks into the Hive source code but I must admit, I am a little overwhelmed with the code base. I was wondering if someone could perhaps point me to a few classes of interest to get me started. I have posted a similar question by accident on the user list. I'm sorry if it seems like I am spamming the mailing lists because of this. With kind regards, Mark
