Still not sure I follow (and pardon what must be a very rudimentary misunderstanding on my part) how you get an average across a data set if the data is split across nodes. With MapReduce, the reducer can get it because all data for a given key is kept to one node. How would this work with Drill?
-----Original Message----- From: Ted Dunning [mailto:[email protected]] Sent: Thursday, April 4, 2013 9:27 AM To: [email protected]; devansh kumar Subject: Re: Basic queries regarding Apache Drill working On Thu, Apr 4, 2013 at 12:27 PM, devansh kumar <[email protected]>wrote: > Hi, > > I am new and am trying to understand how Apache Drill works but i > have a few queries. > Can anyone help me understand these things? > > 1. > I am trying to understand if the execution engine is going to break up > the data. > Normally the data will already have been broken up across a cluster. > What will happen if i am trying to an aggregation operation like (AVERAGE). > How will that work?? > Yes. > I have seen operations as SUM and COUNT. > How will the Query execution tree look like in case of an AVERAGE > It will look exactly like a SUM or COUNT except that two numbers will be accumulated instead of one. > 2. > Does the Resource model is optimized when compared to MapReduce. > Yes. This will happen because multiple levels of aggregation can be done in one tree without the barrier between map and reduce imposed by the MapReduce structure.
