that is a good question. Yes, if we want to enable code generation in such
a scenario it would also need Janino, which increases our footprint by
roughly 0.6MB.

Btw, Janino fits much better into such an in-memory deployment because it
compiles classes in-memory without the need to write class files into a
local working directory. The same could be done for, but would require to custom in-memory


On Fri, Mar 31, 2017 at 9:14 PM, Berthold Reinwald <>

> Sounds like a good idea.
> Wrt codegen, in a pure Java scoring environment w/o Spark and Hadoop, will
> the dependency on Janino still be there (that question applies to JDK as
> well), and what is the footprint?
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail:
> From:   Matthias Boehm <>
> To:
> Date:   03/31/2017 08:17 PM
> Subject:        Java compiler for code generation
> Hi all,
> currently, our new code generator for operator fusion, uses the
> programmatic, which is Java's standard API for
> compilation. Despite a plan cache that mitigates unnecessary compilation
> and recompilation overheads, we still see significant end-to-end overhead
> especially for small input data.
> Moving forward, I'd like to switch to Janino
> (org.codehaus.janino.SimpleCompiler), which is a fast in-memory Java
> compiler with restricted language support. The advantages are
> (1) Reduced compilation overhead: On end-to-end scenarios for L2SVM, GLM,
> and MLogreg, Janino improved total javac compilation time from 2.039 to
> 0.195 (14 operators), from 8.134 to 0.411 (82 operators), and from 4.854
> to
> 0.283 (46 operators), respectively. At the same time, there was no
> measurable impact on runtime efficiency, but even slightly reduced JIT
> compilation overhead.
> (2) Removed JDK requirement: Using the standard
> requires the existence of a JDK, while Janino only requires a JRE, which
> means it makes it easier to apply code generation by default.
> However, I'm raising this here as Janino would add another explicit
> dependency (with BSD license). Fortunately, Spark also uses Janino for
> whole-stage-codegen. So we should be able to mark Janino as provided
> library. The only issue is a pure Hadoop environment, where we still want
> to use code generation for CP operations. To simplify the build, I could
> imagine using the for hadoop execution types, but
> Janino by default.
> If you have any concerns, please let me know by Monday; otherwise I'd like
> to push this change into our upcoming 0.14 release.
> Regards,
> Matthias

Reply via email to