In the regards to comparison: How does it compare to Druid which is also an in-memory warehouse ? Does Drill support joins to in memory dimension tables unlike Druid? Does it have any limitation on the number of records it can fetch, etc?
Regards, Amit > On May 16, 2014, at 8:46 PM, Jason Altekruse <[email protected]> wrote: > > Ted covered the most important points. I just want to add a few > clarifications. > > While the code for Drill so far is written in pure Java, there is not > specific requirement that all of Drill run in Java. Part of the motivation > for using the in-memory representation of records that we did, making it > columnar, and also storing it in java native ByteBuffers, was to enable > integration with native code compiled from C/C++ to run some of our > operators. ByteBuffers are part of the official Java API, but their use is > not recommend. They allow memory operations that you do not find in typical > java data types and structures, but require you to manage your own memory. > > One important use case for us is the ability to pass them through the Java > Native Interface without having to do a copy. While it is still inefficient > to jump from Java to C every record, we should be able to define a clean > interface to take a batch of records (around 1000) in a single jump to a C > context and after the C code finishes processing them, a single jump back > into the java context will also be able to complete quickly in the same > manner as the jump in the other direction. > > With this consideration, any language you could pass data to from C would > be compatible. While we likely will not support a wide array of plugin > languages soon, it should be possible for people to plug in a variety of > existing codebases for adding data processing functionalities to Drill. > > -Jason Altekruse > > >> On Fri, May 16, 2014 at 8:11 PM, Ted Dunning <[email protected]> wrote: >> >> Drill is a very different tool from spark or even from Spark SQL (aka >> Shark). >> >> There is some overlap, but there are important differences. For instance, >> >> - Drill supports weakly typed SQL. >> >> - Drill has a very clever way to pass data from one processor to another. >> This allows very efficient processing >> >> - Drill generates code in response to query and to observed data. This is >> a big deal since it allows high speed with dynamic types >> >> - Drill supports full ANSII SQL, not Hive QL. >> >> - Spark supports programming in Scala >> >> - Spark ties distributed data object to objects in a language like Java or >> Scala rather than using a columnar form. This makes generic user written >> code easier, but is less efficient. >> >> >> >> >> On Thu, May 15, 2014 at 9:41 AM, N.Venkata Naga Ravi >> <[email protected]>wrote: >> >>> Hi, >>> >>> I started exploring Drill , it looks like very interesting tool. Can some >>> body explain how Drill is going to compare with Apache Spark and Storm. >>> Do we still need Apache Spark along with Drill in the Bigdata stack? Or >>> Drill can directly support as replacement with Spark? >>> >>> Thanks, >>> Ravi >>
