Re: Basic queries regarding Apache Drill working

Jacques Nadeau Fri, 05 Apr 2013 08:35:34 -0700

Oops, meant to include a reference as an example of streaming algorithms:
https://github.com/clearspring/stream-lib




On Fri, Apr 5, 2013 at 8:34 AM, Jacques Nadeau <[email protected]> wrote:

> The current thinking is that there will be an approximate query flag.
>  This will be useful in situations where parallel approximations can be
> made.  The simplest example is you want a top 10 group by attr1.  You can
> do a local top N group by attr1 and then merge those results.  While not
> exactly right, it can be statistically accurate based on the right choice
> of N.  There is also parallel approximations for other things such as
> median using streaming algorithms.  The goal is for Drill to be able to use
> these approximation algorithms in a processing tree for more queries.  In
> the case that a user needs exact results, full shuffle/aggregations will
> still need to be done.  They will still benefit from avoiding the various
> MapReduce barriers and requirements for persistence between stages.
>
> J
>
>
> On Thu, Apr 4, 2013 at 10:31 PM, devansh kumar <[email protected]>wrote:
>
>> Hi,
>>
>> I understood what you wanted to say of using SUM and COUNT for
>> calculating AVERAGE.
>> But as i understand this will work very well with Distributed
>> operations..... what about operations like Median.
>>
>> Also i wanted to ask how the query will be broken up in
>> the execution engine.
>> I have gone through the Apache drill documentation and also Google Dremel
>> paper, and i am still confused that how multiple level of aggregation
>> will be created inside one tree.
>>
>> Thanks!
>>
>>
>>
>> ________________________________
>>  From: devansh kumar <[email protected]>
>> To: Andrew Brust <[email protected]>; "
>> [email protected]" <[email protected]>; "
>> [email protected]" <[email protected]>
>> Sent: Friday, April 5, 2013 10:18 AM
>> Subject: Re: Basic queries regarding Apache Drill working
>>
>>
>> Hi,
>>
>> As Andrew asked, how will average work without an operation of Reduce
>> present.
>> Can you explain more on how will the data be aggregated?
>>
>>
>>
>>
>> ________________________________
>>  From: Andrew Brust <[email protected]>
>> To: "[email protected]" <[email protected]>;
>> devansh kumar <[email protected]>
>> Sent: Thursday, April 4, 2013 8:00 PM
>> Subject: RE: Basic queries regarding Apache Drill working
>>
>> Still not sure I follow (and pardon what must be a very rudimentary
>> misunderstanding on my part) how you get an average across a data set if
>> the data is split across nodes.  With MapReduce, the reducer can get it
>> because all data for a given key is kept to one node.  How would this work
>> with Drill?
>>
>> -----Original Message-----
>> From: Ted Dunning [mailto:[email protected]]
>> Sent: Thursday, April 4, 2013 9:27 AM
>> To: [email protected]; devansh kumar
>> Subject: Re: Basic queries regarding Apache Drill working
>>
>> On Thu, Apr 4, 2013 at 12:27 PM, devansh kumar <[email protected]
>> >wrote:
>>
>> > Hi,
>> >
>> > I am new and am
>>  trying to understand how Apache Drill  works but i
>> > have a few queries.
>> > Can anyone help me understand these things?
>> >
>> > 1.
>> > I am trying to understand if the execution engine is going to break up
>> > the data.
>> >
>>
>> Normally the data will already have been broken up across a cluster.
>>
>>
>> > What will happen if i am trying to an aggregation operation like
>> (AVERAGE).
>> > How will that work??
>> >
>>
>> Yes.
>>
>>
>> > I have seen operations as SUM and COUNT.
>> > How will the Query execution tree look like in case of an AVERAGE
>> >
>>
>> It will look exactly like a SUM or COUNT except that two numbers will be
>> accumulated instead of one.
>>
>>
>> > 2.
>> > Does the Resource model is optimized when compared to MapReduce.
>> >
>>
>> Yes.  This will happen because multiple levels of aggregation can be done
>> in one tree without the barrier between map and reduce
>>  imposed by the MapReduce structure.
>>
>
>

Re: Basic queries regarding Apache Drill working

Reply via email to