I was reading article (and references) on the speedup gain in spark2. apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html <https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html>
The main idea is that physical code generated should now be data centric instead of operator centric, and preserve data locality. I am thinking maybe this applies to calcite as well. in terms of switching to the data centric approach, what could calcite do and gain? quote:>The Future: Whole-stage Code Generation >From the above observation, a natural next step for us was to explore the possibility of automatically generating this *handwritten* code at runtime, which we are calling “whole-stage code generation.” This idea is inspired by Thomas Neumann’s seminal VLDB 2011 paper on*Efficiently Compiling Efficient Query Plans for Modern Hardware <http://www.vldb.org/pvldb/vol4/p539-neumann.pdf>*. For more details on the paper, Adrian Colyer has coordinated with us to publish a review on The Morning Paper blog <http://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware> today. The goal is to leverage whole-stage code generation so the engine can achieve the performance of hand-written code, yet provide the functionality of a general purpose engine. Rather than relying on operators for processing data at runtime, these operators together generate code at runtime and collapse each fragment of the query, where possible, into a single function and execute that generated code instead. -- ~~~~~~~~~~~~~~~ no mistakes ~~~~~~~~~~~~~~~~~~
