Re: Drill with Spark

Amit Matety Sat, 17 May 2014 15:20:47 -0700

Thanks Neeraja. I will check out the link provided. 

Sent from my iPhone


> On May 17, 2014, at 12:12 PM, Neeraja Rentachintala 
> <[email protected]> wrote:
> 
> In addition what others said, below are few others (answered in an email
> thread some time back).
> 
> 
> -----------
> - Drill provides ANSI SQL. This means that all the BI/Analytics and SQL
> tools can work as is with Drill using JDBC/ODBC. Druid provides REST APIs
> as the query layer.I am not sure if Druid has SQL layer at all (don't see
> it in their docs)
> 
> - Query flexibility is high with Drill. For ex: Druid supports groupBy
> style queries, but doesn't support JOINs. Drill supports all the key
> analytic functionality such as JOINs, aggregations, sort, filters, wide
> variety of functions to operate on data which makes it suitable for a more
> broader set of use cases
> 
> - Drill supports queries natively on Hadoop data formats (JSON, parquet,
> Text as well as all Hive file formats). You don't need to load or copy the
> data into a specific format in order to do queries.
> 
> - Drill can do direct queries on self-describing data such as JSON,
> Parquet, HBase without defining schema overlays in Hive. You can take a
> look at the "Apache Drill in 10 mins doc" below to get started with Drill
> around some of these capabilities.
> https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes
> 
> 
> 
>> On Sat, May 17, 2014 at 7:43 AM, Timothy Chen <[email protected]> wrote:
>> 
>> Druid just like redshift requires an extra ETL to import the data before
>> you can query, which slows down the freshness of your query able data.
>> 
>> Obvious three are pros and cons to each decision, but Drill also tries to
>> do optimizations as much as possible with metadata available, and also down
>> the road will able to again enough stats after a scan or perhaps even a
>> extra compute stats like what impala does.
>> 
>> Tim
>> 
>> Sent from my iPhone
>> 
>>> On May 17, 2014, at 12:27 AM, Amit Matety <[email protected]> wrote:
>>> 
>>> In the regards to comparison: How does it compare to Druid which is also
>> an in-memory warehouse ? Does Drill support joins to in memory dimension
>> tables unlike Druid? Does it have any limitation on the number of records
>> it can fetch, etc?
>>> 
>>> Regards,
>>> Amit
>>> 
>>>> On May 16, 2014, at 8:46 PM, Jason Altekruse <[email protected]>
>> wrote:
>>>> 
>>>> Ted covered the most important points. I just want to add a few
>>>> clarifications.
>>>> 
>>>> While the code for Drill so far is written in pure Java, there is not
>>>> specific requirement that all of Drill run in Java. Part of the
>> motivation
>>>> for using the in-memory representation of records that we did, making it
>>>> columnar, and also storing it in java native ByteBuffers, was to enable
>>>> integration with native code compiled from C/C++ to run some of our
>>>> operators. ByteBuffers are part of the official Java API, but their use
>> is
>>>> not recommend. They allow memory operations that you do not find in
>> typical
>>>> java data types and structures, but require you to manage your own
>> memory.
>>>> 
>>>> One important use case for us is the ability to pass them through the
>> Java
>>>> Native Interface without having to do a copy. While it is still
>> inefficient
>>>> to jump from Java to C every record, we should be able to define a clean
>>>> interface to take a batch of records (around 1000) in a single jump to
>> a C
>>>> context and after the C code finishes processing them, a single jump
>> back
>>>> into the java context will also be able to complete quickly in the same
>>>> manner as the jump in the other direction.
>>>> 
>>>> With this consideration, any language you could pass data to from C
>> would
>>>> be compatible. While we likely will not support a wide array of plugin
>>>> languages soon, it should be possible for people to plug in a variety of
>>>> existing codebases for adding data processing functionalities to Drill.
>>>> 
>>>> -Jason Altekruse
>>>> 
>>>> 
>>>>> On Fri, May 16, 2014 at 8:11 PM, Ted Dunning <[email protected]>
>> wrote:
>>>>> 
>>>>> Drill is a very different tool from spark or even from Spark SQL (aka
>>>>> Shark).
>>>>> 
>>>>> There is some overlap, but there are important differences.  For
>> instance,
>>>>> 
>>>>> - Drill supports weakly typed SQL.
>>>>> 
>>>>> - Drill has a very clever way to pass data from one processor to
>> another.
>>>>> This allows very efficient processing
>>>>> 
>>>>> - Drill generates code in response to query and to observed data.
>> This is
>>>>> a big deal since it allows high speed with dynamic types
>>>>> 
>>>>> - Drill supports full ANSII SQL, not Hive QL.
>>>>> 
>>>>> - Spark supports programming in Scala
>>>>> 
>>>>> - Spark ties distributed data object to objects in a language like
>> Java or
>>>>> Scala rather than using a columnar form.  This makes generic user
>> written
>>>>> code easier, but is less efficient.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, May 15, 2014 at 9:41 AM, N.Venkata Naga Ravi
>>>>> <[email protected]>wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I started exploring Drill , it looks like very interesting tool. Can
>> some
>>>>>> body explain how Drill is going to compare with Apache Spark and
>> Storm.
>>>>>> Do we still need Apache Spark along with Drill in the Bigdata stack?
>> Or
>>>>>> Drill can directly support as replacement with Spark?
>>>>>> 
>>>>>> Thanks,
>>>>>> Ravi
>>

Re: Drill with Spark

Reply via email to