thanks for that detailed response!

On Mon, Feb 13, 2017 at 12:56 AM, Sumedh Wale <sw...@snappydata.io> wrote:

> The difference is closure invocation instead of a static java.lang.Math
> call. In many cases JIT may not be able to perform inlining and related
> code optimizations though in this specific case it should. This is highly
> dependent on the specific case, but when inlining cannot be done and it
> leads to a method call (especially virtual call) then the difference is
> quite large: few nanoseconds per evaluation vs tens of nanoseconds in my
> experiments.
> Serialization of an additional object as a reference can have a measurable
> effect for low-latency jobs though usually can be ignored.
>
> What has been observed is that if an expression uses CodegenFallback then
> it becomes an order of magnitude slower or more. Most of it is due to
> UnsafeRow read/write overhead which is avoided here, but still care needs
> to be taken for (virtual) function calls too. In some cases JIT does inline
> virtual calls but may not always happen. In my experience the only reliable
> case where it does inline is when the virtual call is on a local variable
> that does not change for multiple invocations (e.g. a final local variable
> outside the while loop of a doProduce).
>
> I think what should work better is encapsulating such code in methods of a
> scala object rather than a class and those can be invoked in generated code
> like static methods. Such calls should be equivalent to inline code
> generation in most cases since JIT will inline the calls where it will
> determine significant benefit. In some cases such method calls will have
> better CPU instruction cache hits (i.e. if same inline code is emitted
> multiple times vs common method calls). All this needs thorough
> micro/macro-benchmarking.
>
> However, I don't recall any large pieces of generated code where this can
> help. Most complex pieces like in 
> HashAggregateExec/SortMergeJoinExec/BroadcastHashJoinExec
> are so because they generate schema specific code (to avoid virtual calls
> and boxing/unboxing, and UnsafeRow read/write in some cases) which is
> significantly faster than the equivalent generic code in doExecute. Or in
> your "castToDateCode" example, don't see how you can reduce it since bulk
> of code is already in the static stringToDate call.
>
>
>
> On Saturday 11 February 2017 03:02 AM, Koert Kuipers wrote:
>
> based on that i take it that math functions would be primary beneficiaries
> since they work on primitives.
>
> so if i take UnaryMathExpression as an example, would i not get the same
> benefit if i change it to this?
>
> abstract class UnaryMathExpression(val f: Double => Double, name: String)
>   extends UnaryExpression with Serializable with ImplicitCastInputTypes {
>
>   override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
>   override def dataType: DataType = DoubleType
>   override def nullable: Boolean = true
>   override def toString: String = s"$name($child)"
>   override def prettyName: String = name
>
>   protected override def nullSafeEval(input: Any): Any = {
>     f(input.asInstanceOf[Double])
>   }
>
>   // name of function in java.lang.Math
>   def funcName: String = name.toLowerCase
>
>   def function(d: Double): Double = f(d)
>
>   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
>     val self = ctx.addReferenceObj(name, this, getClass.getName)
>     defineCodeGen(ctx, ev, c => s"$self.function($c)")
>   }
> }
>
> admittedly in this case the benefit in terms of removing complex codegen
> is not there (the codegen was only one line), but if i can remove codegen
> here i could also remove it in lots of other places where it does get very
> unwieldy simply by replacing it with calls to methods.
>
> Function1 is specialized, so i think (or hope) that my version does no
> extra boxes/unboxing.
>
> On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> With complex types it doesn't work as well, but for primitive types the
>> biggest benefit of whole stage codegen is that we don't even need to put
>> the intermediate data into rows or columns anymore. They are just variables
>> (stored in CPU registers).
>>
>> On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> so i have been looking for a while now at all the catalyst expressions,
>>> and all the relative complex codegen going on.
>>>
>>> so first off i get the benefit of codegen to turn a bunch of chained
>>> iterators transformations into a single codegen stage for spark. that makes
>>> sense to me, because it avoids a bunch of overhead.
>>>
>>> but what i am not so sure about is what the benefit is of converting the
>>> actual stuff that happens inside the iterator transformations into codegen.
>>>
>>> say if we have an expression that has 2 children and creates a struct
>>> for them. why would this be faster in codegen by re-creating the code to do
>>> this in a string (which is complex and error prone) compared to simply have
>>> the codegen call the normal method for this in my class?
>>>
>>> i see so much trivial code be re-created in codegen. stuff like this:
>>>
>>>   private[this] def castToDateCode(
>>>       from: DataType,
>>>       ctx: CodegenContext): CastFunction = from match {
>>>     case StringType =>
>>>       val intOpt = ctx.freshName("intOpt")
>>>       (c, evPrim, evNull) => s"""
>>>         scala.Option<Integer> $intOpt =
>>>           org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDat
>>> e($c);
>>>         if ($intOpt.isDefined()) {
>>>           $evPrim = ((Integer) $intOpt.get()).intValue();
>>>         } else {
>>>           $evNull = true;
>>>         }
>>>        """
>>>
>>> is this really faster than simply calling an equivalent functions from
>>> the codegen, and keeping the codegen logic restricted to the "unrolling" of
>>> chained iterators?
>>>
>>>
>>
>
>

Reply via email to