Re: Drill with Spark

Amit Matety Sat, 17 May 2014 00:32:26 -0700

In the regards to comparison: How does it compare to Druid which is also an 
in-memory warehouse ? Does Drill support joins to in memory dimension tables 
unlike Druid? Does it have any limitation on the number of records it can 
fetch, etc?


Regards,
Amit

> On May 16, 2014, at 8:46 PM, Jason Altekruse <[email protected]> wrote:
> 
> Ted covered the most important points. I just want to add a few
> clarifications.
> 
> While the code for Drill so far is written in pure Java, there is not
> specific requirement that all of Drill run in Java. Part of the motivation
> for using the in-memory representation of records that we did, making it
> columnar, and also storing it in java native ByteBuffers, was to enable
> integration with native code compiled from C/C++ to run some of our
> operators. ByteBuffers are part of the official Java API, but their use is
> not recommend. They allow memory operations that you do not find in typical
> java data types and structures, but require you to manage your own memory.
> 
> One important use case for us is the ability to pass them through the Java
> Native Interface without having to do a copy. While it is still inefficient
> to jump from Java to C every record, we should be able to define a clean
> interface to take a batch of records (around 1000) in a single jump to a C
> context and after the C code finishes processing them, a single jump back
> into the java context will also be able to complete quickly in the same
> manner as the jump in the other direction.
> 
> With this consideration, any language you could pass data to from C would
> be compatible. While we likely will not support a wide array of plugin
> languages soon, it should be possible for people to plug in a variety of
> existing codebases for adding data processing functionalities to Drill.
> 
> -Jason Altekruse
> 
> 
>> On Fri, May 16, 2014 at 8:11 PM, Ted Dunning <[email protected]> wrote:
>> 
>> Drill is a very different tool from spark or even from Spark SQL (aka
>> Shark).
>> 
>> There is some overlap, but there are important differences.  For instance,
>> 
>> - Drill supports weakly typed SQL.
>> 
>> - Drill has a very clever way to pass data from one processor to another.
>> This allows very efficient processing
>> 
>> - Drill generates code in response to query and to observed data.  This is
>> a big deal since it allows high speed with dynamic types
>> 
>> - Drill supports full ANSII SQL, not Hive QL.
>> 
>> - Spark supports programming in Scala
>> 
>> - Spark ties distributed data object to objects in a language like Java or
>> Scala rather than using a columnar form.  This makes generic user written
>> code easier, but is less efficient.
>> 
>> 
>> 
>> 
>> On Thu, May 15, 2014 at 9:41 AM, N.Venkata Naga Ravi
>> <[email protected]>wrote:
>> 
>>> Hi,
>>> 
>>> I started exploring Drill , it looks like very interesting tool. Can some
>>> body explain how Drill is going to compare with Apache Spark and Storm.
>>> Do we still need Apache Spark along with Drill in the Bigdata stack? Or
>>> Drill can directly support as replacement with Spark?
>>> 
>>> Thanks,
>>> Ravi
>>

Re: Drill with Spark

Reply via email to