Re: Why Apache Spark doesn't use Calcite?

Julian Hyde Mon, 13 Jan 2020 14:02:28 -0800

In the earliest days they had Shark (a Spark back-end hacked into Hive)[1]. So, 
they knew some people would want to use SQL. But I don’t think anyone realized 
how important SQL would become.


I think they knew what they were getting with Catalyst. They wanted to make it 
easy to write transformation rules. Because Catalyst is written in Scala, 
transformation rules are very concisely. But Catalyst rules fire destructively; 
this also makes them simpler to write, but it prevents the Volcano-style 
nondeterminism that allow true cost-based optimization.

Spark was, at that time, a project with huge momentum and lots of talented 
people who were itching to write a query optimizer. In that situation you can 
move faster if you build it yourself.

Julian

[1] 
https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html
 
<https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html>

> On Jan 13, 2020, at 1:41 PM, Muhammad Gelbana <[email protected]> wrote:
> 
> Interesting question.
> 
> Someone told me Spark didn't start (~2012) with SQL queries (Introduced
> ~2014) support in mind. Probably only python-based jobs so Catalyst was
> enough then which makes sense to me but I can't confirm that.
> 
> 
> 
> On Mon, Jan 13, 2020 at 4:30 PM Michael Mior <[email protected]> wrote:
> 
>> This discussion on the Spark mailing list may be interesting to follow :)
>> 
>> --
>> Michael Mior
>> [email protected]
>> 
>> 
>> ---------- Forwarded message ---------
>> De : newroyker <[email protected]>
>> Date: lun. 13 janv. 2020 à 09:25
>> Subject: Why Apache Spark doesn't use Calcite?
>> To: <[email protected]>
>> 
>> 
>> Was there a qualitative or quantitative benchmark done before a design
>> decision was made not to use Calcite?
>> 
>> Are there limitations (for heuristic based, cost based, * aware optimizer)
>> in Calcite, and frameworks built on top of Calcite? In the context of big
>> data / TCPH benchmarks.
>> 
>> I was unable to dig up anything concrete from user group / Jira. Appreciate
>> if any Catalyst veteran here can give me pointers. Trying to defend
>> Spark/Catalyst.
>> 
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>

Re: Why Apache Spark doesn't use Calcite?

Reply via email to