Function1 is specialized, but nullSafeEval is Any => Any, so that's still
going to box in the non-codegened execution path.

On Fri, Feb 10, 2017 at 1:32 PM, Koert Kuipers <ko...@tresata.com> wrote:

> based on that i take it that math functions would be primary beneficiaries
> since they work on primitives.
>
> so if i take UnaryMathExpression as an example, would i not get the same
> benefit if i change it to this?
>
> abstract class UnaryMathExpression(val f: Double => Double, name: String)
>   extends UnaryExpression with Serializable with ImplicitCastInputTypes {
>
>   override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
>   override def dataType: DataType = DoubleType
>   override def nullable: Boolean = true
>   override def toString: String = s"$name($child)"
>   override def prettyName: String = name
>
>   protected override def nullSafeEval(input: Any): Any = {
>     f(input.asInstanceOf[Double])
>   }
>
>   // name of function in java.lang.Math
>   def funcName: String = name.toLowerCase
>
>   def function(d: Double): Double = f(d)
>
>   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
>     val self = ctx.addReferenceObj(name, this, getClass.getName)
>     defineCodeGen(ctx, ev, c => s"$self.function($c)")
>   }
> }
>
> admittedly in this case the benefit in terms of removing complex codegen
> is not there (the codegen was only one line), but if i can remove codegen
> here i could also remove it in lots of other places where it does get very
> unwieldy simply by replacing it with calls to methods.
>
> Function1 is specialized, so i think (or hope) that my version does no
> extra boxes/unboxing.
>
> On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> With complex types it doesn't work as well, but for primitive types the
>> biggest benefit of whole stage codegen is that we don't even need to put
>> the intermediate data into rows or columns anymore. They are just variables
>> (stored in CPU registers).
>>
>> On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> so i have been looking for a while now at all the catalyst expressions,
>>> and all the relative complex codegen going on.
>>>
>>> so first off i get the benefit of codegen to turn a bunch of chained
>>> iterators transformations into a single codegen stage for spark. that makes
>>> sense to me, because it avoids a bunch of overhead.
>>>
>>> but what i am not so sure about is what the benefit is of converting the
>>> actual stuff that happens inside the iterator transformations into codegen.
>>>
>>> say if we have an expression that has 2 children and creates a struct
>>> for them. why would this be faster in codegen by re-creating the code to do
>>> this in a string (which is complex and error prone) compared to simply have
>>> the codegen call the normal method for this in my class?
>>>
>>> i see so much trivial code be re-created in codegen. stuff like this:
>>>
>>>   private[this] def castToDateCode(
>>>       from: DataType,
>>>       ctx: CodegenContext): CastFunction = from match {
>>>     case StringType =>
>>>       val intOpt = ctx.freshName("intOpt")
>>>       (c, evPrim, evNull) => s"""
>>>         scala.Option<Integer> $intOpt =
>>>           org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDat
>>> e($c);
>>>         if ($intOpt.isDefined()) {
>>>           $evPrim = ((Integer) $intOpt.get()).intValue();
>>>         } else {
>>>           $evNull = true;
>>>         }
>>>        """
>>>
>>> is this really faster than simply calling an equivalent functions from
>>> the codegen, and keeping the codegen logic restricted to the "unrolling" of
>>> chained iterators?
>>>
>>>
>>
>

Reply via email to