yes agreed. however i believe nullSafeEval is not used for codegen? On Fri, Feb 10, 2017 at 4:56 PM, Michael Armbrust <mich...@databricks.com> wrote:
> Function1 is specialized, but nullSafeEval is Any => Any, so that's still > going to box in the non-codegened execution path. > > On Fri, Feb 10, 2017 at 1:32 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> based on that i take it that math functions would be primary >> beneficiaries since they work on primitives. >> >> so if i take UnaryMathExpression as an example, would i not get the same >> benefit if i change it to this? >> >> abstract class UnaryMathExpression(val f: Double => Double, name: String) >> extends UnaryExpression with Serializable with ImplicitCastInputTypes { >> >> override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType) >> override def dataType: DataType = DoubleType >> override def nullable: Boolean = true >> override def toString: String = s"$name($child)" >> override def prettyName: String = name >> >> protected override def nullSafeEval(input: Any): Any = { >> f(input.asInstanceOf[Double]) >> } >> >> // name of function in java.lang.Math >> def funcName: String = name.toLowerCase >> >> def function(d: Double): Double = f(d) >> >> override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { >> val self = ctx.addReferenceObj(name, this, getClass.getName) >> defineCodeGen(ctx, ev, c => s"$self.function($c)") >> } >> } >> >> admittedly in this case the benefit in terms of removing complex codegen >> is not there (the codegen was only one line), but if i can remove codegen >> here i could also remove it in lots of other places where it does get very >> unwieldy simply by replacing it with calls to methods. >> >> Function1 is specialized, so i think (or hope) that my version does no >> extra boxes/unboxing. >> >> On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <r...@databricks.com> wrote: >> >>> With complex types it doesn't work as well, but for primitive types the >>> biggest benefit of whole stage codegen is that we don't even need to put >>> the intermediate data into rows or columns anymore. They are just variables >>> (stored in CPU registers). >>> >>> On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <ko...@tresata.com> >>> wrote: >>> >>>> so i have been looking for a while now at all the catalyst expressions, >>>> and all the relative complex codegen going on. >>>> >>>> so first off i get the benefit of codegen to turn a bunch of chained >>>> iterators transformations into a single codegen stage for spark. that makes >>>> sense to me, because it avoids a bunch of overhead. >>>> >>>> but what i am not so sure about is what the benefit is of converting >>>> the actual stuff that happens inside the iterator transformations into >>>> codegen. >>>> >>>> say if we have an expression that has 2 children and creates a struct >>>> for them. why would this be faster in codegen by re-creating the code to do >>>> this in a string (which is complex and error prone) compared to simply have >>>> the codegen call the normal method for this in my class? >>>> >>>> i see so much trivial code be re-created in codegen. stuff like this: >>>> >>>> private[this] def castToDateCode( >>>> from: DataType, >>>> ctx: CodegenContext): CastFunction = from match { >>>> case StringType => >>>> val intOpt = ctx.freshName("intOpt") >>>> (c, evPrim, evNull) => s""" >>>> scala.Option<Integer> $intOpt = >>>> org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDat >>>> e($c); >>>> if ($intOpt.isDefined()) { >>>> $evPrim = ((Integer) $intOpt.get()).intValue(); >>>> } else { >>>> $evNull = true; >>>> } >>>> """ >>>> >>>> is this really faster than simply calling an equivalent functions from >>>> the codegen, and keeping the codegen logic restricted to the "unrolling" of >>>> chained iterators? >>>> >>>> >>> >> >