Background:
I have been preparing to give a talk on DataSketches at FOSDEM 2020.  I am
doing this for an acquaintance and don't have much background in them but I
am learning.

Idea:
This is a sketch called the Frequent Distinct Tuples (FTD)[1] sketch.  What
it can do is estimate the number of occurrences of Tuples when the number
of occurrences is "frequent".  Ignoring all the hand waving for a moment,
and understanding that "frequent" is undefined in this discussion.

Would it make sense, inside the optimizer to be able to query to find out
which of various values occur most frequently so that the smallest possible
intermediate solutions can be built?  From my reading the FDT sketch can do
this.  So given a set of properties find out which ones occur most
frequently and use the others first.  Something like that.

DataSketches are in the Apache Incubator.  They look interesting and have
some interesting properties.  I am not sure how applicable they are to us
though.


[1]
https://datasketches.github.io/docs/Frequency/FrequentDistinctTuplesSketch.html

Claude
-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to