Background: I have been preparing to give a talk on DataSketches at FOSDEM 2020. I am doing this for an acquaintance and don't have much background in them but I am learning.
Idea: This is a sketch called the Frequent Distinct Tuples (FTD)[1] sketch. What it can do is estimate the number of occurrences of Tuples when the number of occurrences is "frequent". Ignoring all the hand waving for a moment, and understanding that "frequent" is undefined in this discussion. Would it make sense, inside the optimizer to be able to query to find out which of various values occur most frequently so that the smallest possible intermediate solutions can be built? From my reading the FDT sketch can do this. So given a set of properties find out which ones occur most frequently and use the others first. Something like that. DataSketches are in the Apache Incubator. They look interesting and have some interesting properties. I am not sure how applicable they are to us though. [1] https://datasketches.github.io/docs/Frequency/FrequentDistinctTuplesSketch.html Claude -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
