whole stage code generation

Albert Fri, 27 May 2016 03:16:10 -0700

I was reading article (and references) on the speedup gain in spark2.
apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
<https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html>


The main idea is that physical code generated should now be data centric
instead of operator centric, and preserve data locality.

I am thinking maybe this applies to calcite as well. in terms of switching
to the data centric approach, what could calcite do and gain?




quote:>The Future: Whole-stage Code Generation

>From the above observation, a natural next step for us was to explore the
possibility of automatically generating this *handwritten* code at runtime,
which we are calling “whole-stage code generation.” This idea is inspired
by Thomas Neumann’s seminal VLDB 2011 paper on*Efficiently Compiling
Efficient Query Plans for Modern Hardware
<http://www.vldb.org/pvldb/vol4/p539-neumann.pdf>*. For more details on the
paper, Adrian Colyer has coordinated with us to publish a review on The
Morning Paper blog
<http://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware>
 today.

The goal is to leverage whole-stage code generation so the engine can
achieve the performance of hand-written code, yet provide the functionality
of a general purpose engine. Rather than relying on operators for
processing data at runtime, these operators together generate code at
runtime and collapse each fragment of the query, where possible, into a
single function and execute that generated code instead.

-- 
~~~~~~~~~~~~~~~
no mistakes
~~~~~~~~~~~~~~~~~~

whole stage code generation

Reply via email to