Adam, I had a similar experience with queries not returning immediately, this setting seems like it might help:
ALTER SESSION set `planner.add_producer_consumer`=false; Chris Matta [email protected] 215-701-3146 On Wed, Oct 29, 2014 at 1:50 PM, Adam Hunt <[email protected]> wrote: > Hi Tridib, > > I just completed a simple evaluation of Drill 0.6.0 and Spark SQL 1.1.0. I > ran a few queries over 14GB of Snappy compressed Parquet files on a four > server MapR cluster (96 cores, 256 GB). Here are the results. > > Spark SQL requires some very very minor setup, where Drill doesn't. > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > val testData = sqlContext.parquetFile("/user/ahunt/test/2014/10/28/") > testData.registerTempTable("testData") > > In Drill, a simple count query took 19s the first time and 0.9s the second > time > SELECT count(*) FROM dfs.`/user/ahunt/test/2014/10/28/part-*`; > > In Spark SQL, it took 17s the first time and 1.7s the second > sqlContext.sql("SELECT count(*) FROM testData").collect().foreach(println) > > In Drill, a simple group by query printed the results, but would not return > to the prompt without hitting ctrl-c (after 6s). > SELECT httpResponseCode, count(*) FROM > dfs.`/user/ahunt/test/2014/10/28/part-*` GROUP BY httpResponseCode; > > In Spark SQL, it finished in 3.6s > sqlContext.sql("SELECT httpResponseCode,count(*) FROM testData GROUP BY > httpResponseCode").collect().foreach(println) > > In Drill, this query never finished (probably due to the issue described > above). > SELECT httpResponseCode, count(*) FROM > dfs.`/user/ahunt/test/2014/10/28/` GROUP > BY httpResponseCode ORDER BY httpResponseCode DESC; > > In Spark SQL, the same query finished in 5s. > sqlContext.sql("SELECT httpResponseCode,count(*) FROM testData GROUP BY > httpResponseCode ORDER BY httpResponseCode > DESC").collect().foreach(println) > > Although Drill seems very promising, it seems that it has a few issues to > work out, and since I already use Spark I'm going to stick with Spark SQL > for now. > > Adam > > > On Wed, Oct 29, 2014 at 10:00 AM, Tridib Samanta <[email protected]> > wrote: > > > Hello Experts, > > I am new in Apache Drill. To me it's very similar to Spark SQL. I was > > wandering how does it differ from Spark SQL. What are the use case where > > Apache Drill thrives compare to Spark SQL? > > > > Thanks & Regards > > Tridib > > >
