Thanks Neeraja. I will check out the link provided. Sent from my iPhone
> On May 17, 2014, at 12:12 PM, Neeraja Rentachintala > <[email protected]> wrote: > > In addition what others said, below are few others (answered in an email > thread some time back). > > > ----------- > - Drill provides ANSI SQL. This means that all the BI/Analytics and SQL > tools can work as is with Drill using JDBC/ODBC. Druid provides REST APIs > as the query layer.I am not sure if Druid has SQL layer at all (don't see > it in their docs) > > - Query flexibility is high with Drill. For ex: Druid supports groupBy > style queries, but doesn't support JOINs. Drill supports all the key > analytic functionality such as JOINs, aggregations, sort, filters, wide > variety of functions to operate on data which makes it suitable for a more > broader set of use cases > > - Drill supports queries natively on Hadoop data formats (JSON, parquet, > Text as well as all Hive file formats). You don't need to load or copy the > data into a specific format in order to do queries. > > - Drill can do direct queries on self-describing data such as JSON, > Parquet, HBase without defining schema overlays in Hive. You can take a > look at the "Apache Drill in 10 mins doc" below to get started with Drill > around some of these capabilities. > https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes > > > >> On Sat, May 17, 2014 at 7:43 AM, Timothy Chen <[email protected]> wrote: >> >> Druid just like redshift requires an extra ETL to import the data before >> you can query, which slows down the freshness of your query able data. >> >> Obvious three are pros and cons to each decision, but Drill also tries to >> do optimizations as much as possible with metadata available, and also down >> the road will able to again enough stats after a scan or perhaps even a >> extra compute stats like what impala does. >> >> Tim >> >> Sent from my iPhone >> >>> On May 17, 2014, at 12:27 AM, Amit Matety <[email protected]> wrote: >>> >>> In the regards to comparison: How does it compare to Druid which is also >> an in-memory warehouse ? Does Drill support joins to in memory dimension >> tables unlike Druid? Does it have any limitation on the number of records >> it can fetch, etc? >>> >>> Regards, >>> Amit >>> >>>> On May 16, 2014, at 8:46 PM, Jason Altekruse <[email protected]> >> wrote: >>>> >>>> Ted covered the most important points. I just want to add a few >>>> clarifications. >>>> >>>> While the code for Drill so far is written in pure Java, there is not >>>> specific requirement that all of Drill run in Java. Part of the >> motivation >>>> for using the in-memory representation of records that we did, making it >>>> columnar, and also storing it in java native ByteBuffers, was to enable >>>> integration with native code compiled from C/C++ to run some of our >>>> operators. ByteBuffers are part of the official Java API, but their use >> is >>>> not recommend. They allow memory operations that you do not find in >> typical >>>> java data types and structures, but require you to manage your own >> memory. >>>> >>>> One important use case for us is the ability to pass them through the >> Java >>>> Native Interface without having to do a copy. While it is still >> inefficient >>>> to jump from Java to C every record, we should be able to define a clean >>>> interface to take a batch of records (around 1000) in a single jump to >> a C >>>> context and after the C code finishes processing them, a single jump >> back >>>> into the java context will also be able to complete quickly in the same >>>> manner as the jump in the other direction. >>>> >>>> With this consideration, any language you could pass data to from C >> would >>>> be compatible. While we likely will not support a wide array of plugin >>>> languages soon, it should be possible for people to plug in a variety of >>>> existing codebases for adding data processing functionalities to Drill. >>>> >>>> -Jason Altekruse >>>> >>>> >>>>> On Fri, May 16, 2014 at 8:11 PM, Ted Dunning <[email protected]> >> wrote: >>>>> >>>>> Drill is a very different tool from spark or even from Spark SQL (aka >>>>> Shark). >>>>> >>>>> There is some overlap, but there are important differences. For >> instance, >>>>> >>>>> - Drill supports weakly typed SQL. >>>>> >>>>> - Drill has a very clever way to pass data from one processor to >> another. >>>>> This allows very efficient processing >>>>> >>>>> - Drill generates code in response to query and to observed data. >> This is >>>>> a big deal since it allows high speed with dynamic types >>>>> >>>>> - Drill supports full ANSII SQL, not Hive QL. >>>>> >>>>> - Spark supports programming in Scala >>>>> >>>>> - Spark ties distributed data object to objects in a language like >> Java or >>>>> Scala rather than using a columnar form. This makes generic user >> written >>>>> code easier, but is less efficient. >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, May 15, 2014 at 9:41 AM, N.Venkata Naga Ravi >>>>> <[email protected]>wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I started exploring Drill , it looks like very interesting tool. Can >> some >>>>>> body explain how Drill is going to compare with Apache Spark and >> Storm. >>>>>> Do we still need Apache Spark along with Drill in the Bigdata stack? >> Or >>>>>> Drill can directly support as replacement with Spark? >>>>>> >>>>>> Thanks, >>>>>> Ravi >>
