Hi all, Investigating performance of Hive query compile I found some problems around memorization in GeneratedMetadataHandler_xxx classes. These classes are generated from JaninonRelMetadataProvider and the generated code does memorization, anchored on the ‘map’ field of the RelMetadataQuery ‘mq’ parameter. My measurements show that these calls explode into deep recursive stacks. I was measuring a complex query (49 joins, plenty of expressions) and some of the numbers look staggering. Take for instance GeneratedMetadataHandler_RowCount.getRowCount: - It is called 9303 times as top call - It generates 165754 total recursive calls, up to a nest level 18 - The memo cache is hit 18065 (successful found key) - The memo is populated 147689 times (missed key) - The function gets no less than 22186 distinct RelMetadataQuery `mq` instances (!!).
This situation repeats for each GeneratedMetadataHandler_XXX code: Class Top Calls Total Calls Memo Cache Hits Memo Cache Miss Distinct RelMetadataQuery instances getMaxRowCount 489 105262 6615 98647 489 getDistinctRowCount 6828 74286 6179 68107 5828 areColumnsUnique 19197 149708 13367 136341 2139 getColumnOrigins 250 3021 39 2982 24 getCumulativeCost 2249 9267 4660 4607 1 getNonCumulativeCost 3559 3559 1285 3559 18 getSelectivity 15636 35727 1114 34613 4984 getUniqueKeys 26311 111715 5212 106503 3266 Looking at this, the root problems seems to be the fact that the code uses often the construct `RelMetadataQuery.instance()` to obtain a reference to a needed object. But each call to `instance()` returns a new object, and this new object has a new, clean, memorization `map` field. So we have a very poor memo cache hit ratio, but far worse is the effect of repeating ad-nauseam recursive calls on deep trees. I made an experiment where I modified the code in `RelMetadataQuery.instance()` to reference a threadLocal `map` field and the difference is like night vs. day: Class Top Calls Total Calls Memo Cache Hits Memo Cache Miss getRowCount 8179 14426 12920 1506 getMaxRowCount 489 2535 1327 1208 getDistinctRowCount 3138 7947 4939 3008 areColumnsUnique 1103 3191 1495 1696 getColumnOrigins 250 1273 205 1068 getCumulativeCost 2249 6755 4288 2467 getNonCumulativeCost 2274 2274 0 2274 getSelectivity 539 562 27 535 getUniqueKeys 2635 5248 2437 2811 Gone are the deep recursive calls and explosion into +100k calls. I’m seeing 2x - 10x compile time improvements. I’m asking here if there is some reason behind the frequent replacement of the RelMetadataQuery object being used (and hence a clean mem cache), or is just some unintended consequence? I am making now changes on Hive side to address this (HIVE-16757), if the cache reset effect is accidental we should address this as well in Calcite. BTW I think `RelMetadataQuery.instance()` should be named `RelMetadataQuery.createInstance()` to be clear what the effect is. Thanks, ~Remus
