Thanks Ted. What exactly, I thought is pre-computing the aggregations like cubes might be better. But as you mentioned, that might be true, If I know ahead of time.
On Mon, Jun 10, 2013 at 2:20 PM, Ted Dunning <[email protected]> wrote: > On Mon, Jun 10, 2013 at 10:35 AM, AnilKumar B <[email protected]> > wrote: > > > Hi, > > > > I went through the Drill documentation and going through the source > code, I > > have few questions regarding to drill. Can any one help me in > understanding > > it much better? > > > > 1) How the Drill aggregations are real time? Anyway it is going to scan > all > > the records right? What exactly it optimizes when compare to Map Reduce > > based Hive(Considering index feature)? > > > > Real-time is often used in a bit of a sloppy fashion. The meaning with > respect to Drill is "ad hoc, interactive queries". > > > > 2) For aggregations, Is in't Cube materialization will be better > solution? > > For example like HBase-Lattice kind of solution. > > > > Cubes are fine if you know what you are doing ahead of time. They still > require a pass over the data. Nothing prevents Drill from creating and/or > cubes. > > 3) What exactly the real use cases for Drill? Whenever we say interactive, > > mostly they include aggregations, and when we say aggregations definitely > > they cannot be real time, when we scan whole raw data. > > > > Aggregation is a fine use case. There are many others as well. For > instance, incremental cooccurrence counting. Or, with special UDF's, the > inner loop of many machine learning applications. > > Drill has an especially flexible scanner API which will allow cross data > source scanning. > > Not sure what you are getting at, though, so I may have mis interpreted > something you said. >
