What might be the biggest factor affecting running time here is that Drill's query execution is not fault tolerant while Spark's is.  The philosophy is different, Drill's says "when you're doing interactive analytics and a node dies, killing your query as it goes, just run the query again."

On 2022/04/07 16:11, Wes Peng wrote:

Hi Jacek,

Spark and Drill have no direct relations. But they have the similar architecture.

If you read the book "Learning Apache Drill" (I guess it's free online), chap 3 will give you Drill's SQL engine architecture:


It's quite similar to Spark's.

And the distributed implementation architecture is almost the same as Spark:


Though they are separated products, but have the similar implementation IMO.

No, I didn't use a statement optimized for Drill. It's just a common SQL statement.

The reason for drill is faster, I think it's b/c drill's direct mmap technology. It's more memory consumed than spark, so more faster.

Thanks.


Jacek Laskowski wrote:
Is this true that Drill is Spark or vice versa under the hood? If so, how is it possible that Drill is faster? What does Drill do to make the query faster? Could this be that you used a type of query Drill is optimized for? Just guessing and am really curious (not implying that one is better or worse than the other(s)).


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to