Hi Jordan, Just want to check if you guys have any plan to contribute back the work of converting back and forth between Calcite and Spark/Catalyst plans?
Thanks, Khai On Thu, Feb 16, 2017 at 3:42 PM, [email protected] < [email protected]> wrote: > Calcite differs from Catalyst in many ways. First of all, Catalyst is > essentially a heuristic optimizer, while Calcite optimizers often combine > heuristics and cost-based optimization. Catalyst pushes down predicates and > projections to most data sources, while Calcite can often push down full > queries. It's certainly also capable of pushing down filters for struct > fields. Some of these types of features like SPARK-19609 may have to be > implemented as custom rules. But we've successfully replaced Spark's > Catalyst optimizer with Calcite and have recorded up to two orders of > magnitude improvements in performance running TPC-DS queries against many > databases. > > Whether there's value in using Calcite in Spark depends on your use case. > Drill and other systems are certainly sufficient to take better advantage > of the features of underlying databases. It's not easy to build the > conversions between Catalyst plans and Calcite plans - it took us months - > but doing so allowed us to continue using Spark's popular programmatic APIs > while significantly improving its performance when querying relational > databases, Mongo, etc. > > > On Feb 16, 2017, at 3:28 PM, Nick Dimiduk <[email protected]> wrote: > > > > Heya, > > > > I've been using Spark recently and have stumbled across a couple > surprising > > bugs/feature gaps. It got me curious about how Calcite would handle the > > same scenarios. Basically, I'm wondering if Calcite would handle these > > scenarios directly or if it would defer to the underlying runtime. I.E., > > would I be better off for this task with Calcite via Hive or Drill vs. > > Catalyst via Spark. > > > > Here are the tickets for reference. > > > > SPARK-19615 Provide Dataset union convenience for divergent schema > > SPARK-19609 Broadcast joins should pushdown join constraints as Filter to > > the larger relation > > SPARK-19638 Filter pushdown not working for struct fields > > > > Thanks in advance! > > Nick >
