There is a pending pull request [1] to support table statistics. This includes using HyperLogLog to estimate number of distinct values, etc. I do not know further details.
Thank you, Sudheesh [1] https://github.com/apache/drill/pull/425 <https://github.com/apache/drill/pull/425> > On May 1, 2016, at 7:26 PM, Edmon Begoli <[email protected]> wrote: > > Yes, I am preparing a research seminar, and I am doing a survey of the uses > or probabilistic and synopsis data structures in post-Hadoop "Big Data" > technologies. > > On Sun, May 1, 2016 at 8:34 PM, Julian Hyde <[email protected]> wrote: > >> Drill also makes use of hash tables and hash partitioning. >> >> I’m not sure what was the purpose of your question. Are you carrying out a >> survey? >> >> Julian >> >> >>> On May 1, 2016, at 5:22 PM, Ted Dunning <[email protected]> wrote: >>> >>> Drill doesn't use any such data structures in itself. The emphasis has >> been >>> on being correct first with the option of introducing approximations >> later. >>> >>> That said, you can definitely define aggregators yourself. Last I >> checked, >>> however, user defined aggregators are single level ... that means that >>> everything that gets aggregated has to go through a single function which >>> definitely limits scalability. This was several months ago, though, so >>> things may have improved by now. >>> >>> Perhaps somebody can comment on whether multi-level user-defined >>> aggregators are possible? >>> >>> >>> >>> On Sat, Apr 30, 2016 at 8:32 AM, Edmon Begoli <[email protected]> wrote: >>> >>>> Is Drill using any of the probabilistic data structures [1], and if so - >>>> which ones and how? >>>> >>>> Thank you, >>>> Edmon >>>> >>>> 1. Probabilistic Data Structures - >>>> https://en.m.wikipedia.org/wiki/Category:Probabilistic_data_structures >>>> >> >>
