Re: Basic queries regarding Apache Drill working

devansh kumar Thu, 04 Apr 2013 22:32:27 -0700

Hi,

I understood what you wanted to say of using SUM and COUNT for calculating 
AVERAGE.
But as i understand this will work very well with Distributed operations..... 
what about operations like Median.

Also i wanted to ask how the query will be broken up in the execution engine.
I have gone through the Apache drill documentation and also Google Dremel 
paper, and i am still confused that how multiple level of aggregation 
will be created inside one tree.

Thanks!

________________________________
 From: devansh kumar <[email protected]>
To: Andrew Brust <[email protected]>; 
"[email protected]" <[email protected]>; 
"[email protected]" <[email protected]> 
Sent: Friday, April 5, 2013 10:18 AM
Subject: Re: Basic queries regarding Apache Drill working

Hi,

As Andrew asked, how will average work without an operation of Reduce present. 
Can you explain more on how will the data be aggregated?

________________________________
 From: Andrew Brust <[email protected]>
To: "[email protected]" <[email protected]>; 
devansh kumar <[email protected]> 
Sent: Thursday, April 4, 2013 8:00 PM
Subject: RE: Basic queries regarding Apache Drill working

Still not sure I follow (and pardon what must be a very rudimentary 
misunderstanding on my part) how you get an average across a data set if the 
data is split across nodes.  With MapReduce, the reducer can get it because all 
data for a given key is kept to one node.  How would this work with Drill?

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Thursday, April 4, 2013 9:27 AM
To: [email protected]; devansh kumar
Subject: Re: Basic queries regarding Apache Drill working

On Thu, Apr 4, 2013 at 12:27 PM, devansh kumar <[email protected]>wrote:

> Hi,
>
> I am new and am
 trying to understand how Apache Drill  works but i 
> have a few queries.
> Can anyone help me understand these things?
>
> 1.
> I am trying to understand if the execution engine is going to break up 
> the data.
>

Normally the data will already have been broken up across a cluster.

> What will happen if i am trying to an aggregation operation like (AVERAGE).
> How will that work??
>

Yes.

> I have seen operations as SUM and COUNT.
> How will the Query execution tree look like in case of an AVERAGE
>

It will look exactly like a SUM or COUNT except that two numbers will be 
accumulated instead of one.

> 2.
> Does the Resource model is optimized when compared to MapReduce.
>

Yes.  This will happen because multiple levels of aggregation can be done in 
one tree without the barrier between map and reduce
 imposed by the MapReduce structure.

Re: Basic queries regarding Apache Drill working

Reply via email to