so i have been looking for a while now at all the catalyst expressions, and all the relative complex codegen going on.
so first off i get the benefit of codegen to turn a bunch of chained iterators transformations into a single codegen stage for spark. that makes sense to me, because it avoids a bunch of overhead. but what i am not so sure about is what the benefit is of converting the actual stuff that happens inside the iterator transformations into codegen. say if we have an expression that has 2 children and creates a struct for them. why would this be faster in codegen by re-creating the code to do this in a string (which is complex and error prone) compared to simply have the codegen call the normal method for this in my class? i see so much trivial code be re-created in codegen. stuff like this: private[this] def castToDateCode( from: DataType, ctx: CodegenContext): CastFunction = from match { case StringType => val intOpt = ctx.freshName("intOpt") (c, evPrim, evNull) => s""" scala.Option<Integer> $intOpt = org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c); if ($intOpt.isDefined()) { $evPrim = ((Integer) $intOpt.get()).intValue(); } else { $evNull = true; } """ is this really faster than simply calling an equivalent functions from the codegen, and keeping the codegen logic restricted to the "unrolling" of chained iterators?