Still not sure I follow (and pardon what must be a very rudimentary 
misunderstanding on my part) how you get an average across a data set if the 
data is split across nodes.  With MapReduce, the reducer can get it because all 
data for a given key is kept to one node.  How would this work with Drill?

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Thursday, April 4, 2013 9:27 AM
To: [email protected]; devansh kumar
Subject: Re: Basic queries regarding Apache Drill working

On Thu, Apr 4, 2013 at 12:27 PM, devansh kumar <[email protected]>wrote:

> Hi,
>
> I am new and am trying to understand how Apache Drill  works but i 
> have a few queries.
> Can anyone help me understand these things?
>
> 1.
> I am trying to understand if the execution engine is going to break up 
> the data.
>

Normally the data will already have been broken up across a cluster.


> What will happen if i am trying to an aggregation operation like (AVERAGE).
> How will that work??
>

Yes.


> I have seen operations as SUM and COUNT.
> How will the Query execution tree look like in case of an AVERAGE
>

It will look exactly like a SUM or COUNT except that two numbers will be 
accumulated instead of one.


> 2.
> Does the Resource model is optimized when compared to MapReduce.
>

Yes.  This will happen because multiple levels of aggregation can be done in 
one tree without the barrier between map and reduce imposed by the MapReduce 
structure.

Reply via email to