Re: Probabilistic data structures in Drill

Sudheesh Katkam Mon, 02 May 2016 10:15:14 -0700

There is a pending pull request [1] to support table statistics. This includes 
using HyperLogLog to estimate number of distinct values, etc. I do not know 
further details.


Thank you,
Sudheesh

[1] https://github.com/apache/drill/pull/425 
<https://github.com/apache/drill/pull/425>

> On May 1, 2016, at 7:26 PM, Edmon Begoli <[email protected]> wrote:
> 
> Yes, I am preparing a research seminar, and I am doing a survey of the uses
> or probabilistic and synopsis data structures in post-Hadoop "Big Data"
> technologies.
> 
> On Sun, May 1, 2016 at 8:34 PM, Julian Hyde <[email protected]> wrote:
> 
>> Drill also makes use of hash tables and hash partitioning.
>> 
>> I’m not sure what was the purpose of your question. Are you carrying out a
>> survey?
>> 
>> Julian
>> 
>> 
>>> On May 1, 2016, at 5:22 PM, Ted Dunning <[email protected]> wrote:
>>> 
>>> Drill doesn't use any such data structures in itself. The emphasis has
>> been
>>> on being correct first with the option of introducing approximations
>> later.
>>> 
>>> That said, you can definitely define aggregators yourself. Last I
>> checked,
>>> however, user defined aggregators are single level ... that means that
>>> everything that gets aggregated has to go through a single function which
>>> definitely limits scalability. This was several months ago, though, so
>>> things may have improved by now.
>>> 
>>> Perhaps somebody can comment on whether multi-level user-defined
>>> aggregators are possible?
>>> 
>>> 
>>> 
>>> On Sat, Apr 30, 2016 at 8:32 AM, Edmon Begoli <[email protected]> wrote:
>>> 
>>>> Is Drill using any of the probabilistic data structures [1], and if so -
>>>> which ones and how?
>>>> 
>>>> Thank you,
>>>> Edmon
>>>> 
>>>> 1. Probabilistic Data Structures -
>>>> https://en.m.wikipedia.org/wiki/Category:Probabilistic_data_structures
>>>> 
>> 
>>

Re: Probabilistic data structures in Drill

Reply via email to