Re: Calcite vs Catalyst

Khai Tran Wed, 31 May 2017 10:45:06 -0700

Hi Jordan,
Just want to check if you guys have any plan to contribute back the work of
converting back and forth between Calcite and Spark/Catalyst plans?


Thanks,
Khai

On Thu, Feb 16, 2017 at 3:42 PM, [email protected] <
[email protected]> wrote:

> Calcite differs from Catalyst in many ways. First of all, Catalyst is
> essentially a heuristic optimizer, while Calcite optimizers often combine
> heuristics and cost-based optimization. Catalyst pushes down predicates and
> projections to most data sources, while Calcite can often push down full
> queries. It's certainly also capable of pushing down filters for struct
> fields. Some of these types of features like SPARK-19609 may have to be
> implemented as custom rules. But we've successfully replaced Spark's
> Catalyst optimizer with Calcite and have recorded up to two orders of
> magnitude improvements in performance running TPC-DS queries against many
> databases.
>
> Whether there's value in using Calcite in Spark depends on your use case.
> Drill and other systems are certainly sufficient to take better advantage
> of the features of underlying databases. It's not easy to build the
> conversions between Catalyst plans and Calcite plans - it took us months -
> but doing so allowed us to continue using Spark's popular programmatic APIs
> while significantly improving its performance when querying relational
> databases, Mongo, etc.
>
> > On Feb 16, 2017, at 3:28 PM, Nick Dimiduk <[email protected]> wrote:
> >
> > Heya,
> >
> > I've been using Spark recently and have stumbled across a couple
> surprising
> > bugs/feature gaps. It got me curious about how Calcite would handle the
> > same scenarios. Basically, I'm wondering if Calcite would handle these
> > scenarios directly or if it would defer to the underlying runtime. I.E.,
> > would I be better off for this task with Calcite via Hive or Drill vs.
> > Catalyst via Spark.
> >
> > Here are the tickets for reference.
> >
> > SPARK-19615 Provide Dataset union convenience for divergent schema
> > SPARK-19609 Broadcast joins should pushdown join constraints as Filter to
> > the larger relation
> > SPARK-19638 Filter pushdown not working for struct fields
> >
> > Thanks in advance!
> > Nick
>

Re: Calcite vs Catalyst

Reply via email to