benefits of code gen

Koert Kuipers Fri, 10 Feb 2017 11:23:17 -0800

so i have been looking for a while now at all the catalyst expressions, and
all the relative complex codegen going on.


so first off i get the benefit of codegen to turn a bunch of chained
iterators transformations into a single codegen stage for spark. that makes
sense to me, because it avoids a bunch of overhead.

but what i am not so sure about is what the benefit is of converting the
actual stuff that happens inside the iterator transformations into codegen.

say if we have an expression that has 2 children and creates a struct for
them. why would this be faster in codegen by re-creating the code to do
this in a string (which is complex and error prone) compared to simply have
the codegen call the normal method for this in my class?

i see so much trivial code be re-created in codegen. stuff like this:

  private[this] def castToDateCode(
      from: DataType,
      ctx: CodegenContext): CastFunction = from match {
    case StringType =>
      val intOpt = ctx.freshName("intOpt")
      (c, evPrim, evNull) => s"""
        scala.Option<Integer> $intOpt =
          org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
        if ($intOpt.isDefined()) {
          $evPrim = ((Integer) $intOpt.get()).intValue();
        } else {
          $evNull = true;
        }
       """

is this really faster than simply calling an equivalent functions from the
codegen, and keeping the codegen logic restricted to the "unrolling" of
chained iterators?

benefits of code gen

Reply via email to