RE: Apache Drill Vs Spark SQL

Tridib Samanta Wed, 29 Oct 2014 11:14:35 -0700
Hi Adam,
Thanks for sharing this! Apache Drill is very easy to get started. I liked the 
part that Drill manages the meta data part by itself and does not required Hive 
(like Spark).
 
Thanks
Tridib
 
> Date: Wed, 29 Oct 2014 10:50:37 -0700
> Subject: Re: Apache Drill Vs Spark SQL
> From: [email protected]
> To: [email protected]
> 
> Hi Tridib,
> 
> I just completed a simple evaluation of Drill 0.6.0 and Spark SQL 1.1.0.  I
> ran a few queries over 14GB of Snappy compressed Parquet files on a four
> server MapR cluster (96 cores, 256 GB).  Here are the results.
> 
> Spark SQL requires some very very minor setup, where Drill doesn't.
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> val testData = sqlContext.parquetFile("/user/ahunt/test/2014/10/28/")
> testData.registerTempTable("testData")
> 
> In Drill, a simple count query took 19s the first time and 0.9s the second
> time
> SELECT count(*) FROM  dfs.`/user/ahunt/test/2014/10/28/part-*`;
> 
> In Spark SQL, it took 17s the first time and 1.7s the second
> sqlContext.sql("SELECT count(*) FROM testData").collect().foreach(println)
> 
> In Drill, a simple group by query printed the results, but would not return
> to the prompt without hitting ctrl-c (after 6s).
> SELECT httpResponseCode, count(*) FROM
> dfs.`/user/ahunt/test/2014/10/28/part-*` GROUP BY httpResponseCode;
> 
> In Spark SQL, it finished in 3.6s
> sqlContext.sql("SELECT httpResponseCode,count(*) FROM testData GROUP BY
> httpResponseCode").collect().foreach(println)
> 
> In Drill, this query never finished (probably due to the issue described
> above).
> SELECT httpResponseCode, count(*) FROM
> dfs.`/user/ahunt/test/2014/10/28/` GROUP
> BY httpResponseCode ORDER BY httpResponseCode DESC;
> 
> In Spark SQL, the same query finished in 5s.
> sqlContext.sql("SELECT httpResponseCode,count(*) FROM testData GROUP BY
> httpResponseCode ORDER BY httpResponseCode DESC").collect().foreach(println)
> 
> Although Drill seems very promising, it seems that it has a few issues to
> work out, and since I already use Spark I'm going to stick with Spark SQL
> for now.
> 
> Adam
> 
> 
> On Wed, Oct 29, 2014 at 10:00 AM, Tridib Samanta <[email protected]>
> wrote:
> 
> > Hello Experts,
> > I am new in Apache Drill. To me it's very similar to Spark SQL. I was
> > wandering how does it differ from Spark SQL. What are the use case where
> > Apache Drill thrives compare to Spark SQL?
> >
> > Thanks & Regards
> > Tridib
> >
RE: Apache Drill Vs Spark SQL

Reply via email to