that is a good question. Yes, if we want to enable code generation in such a scenario it would also need Janino, which increases our footprint by roughly 0.6MB.
Btw, Janino fits much better into such an in-memory deployment because it compiles classes in-memory without the need to write class files into a local working directory. The same could be done for javax.tools.JavaCompiler, but would require to custom in-memory JavaFileManager. Regards, Matthias On Fri, Mar 31, 2017 at 9:14 PM, Berthold Reinwald <reinw...@us.ibm.com> wrote: > Sounds like a good idea. > > Wrt codegen, in a pure Java scoring environment w/o Spark and Hadoop, will > the dependency on Janino still be there (that question applies to JDK as > well), and what is the footprint? > > Regards, > Berthold Reinwald > IBM Almaden Research Center > office: (408) 927 2208; T/L: 457 2208 > e-mail: reinw...@us.ibm.com > > > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Date: 03/31/2017 08:17 PM > Subject: Java compiler for code generation > > > > Hi all, > > currently, our new code generator for operator fusion, uses the > programmatic javax.tools.JavaCompiler, which is Java's standard API for > compilation. Despite a plan cache that mitigates unnecessary compilation > and recompilation overheads, we still see significant end-to-end overhead > especially for small input data. > > Moving forward, I'd like to switch to Janino > (org.codehaus.janino.SimpleCompiler), which is a fast in-memory Java > compiler with restricted language support. The advantages are > > (1) Reduced compilation overhead: On end-to-end scenarios for L2SVM, GLM, > and MLogreg, Janino improved total javac compilation time from 2.039 to > 0.195 (14 operators), from 8.134 to 0.411 (82 operators), and from 4.854 > to > 0.283 (46 operators), respectively. At the same time, there was no > measurable impact on runtime efficiency, but even slightly reduced JIT > compilation overhead. > > (2) Removed JDK requirement: Using the standard javax.tools.JavaCompiler > requires the existence of a JDK, while Janino only requires a JRE, which > means it makes it easier to apply code generation by default. > > However, I'm raising this here as Janino would add another explicit > dependency (with BSD license). Fortunately, Spark also uses Janino for > whole-stage-codegen. So we should be able to mark Janino as provided > library. The only issue is a pure Hadoop environment, where we still want > to use code generation for CP operations. To simplify the build, I could > imagine using the javax.tools.JavaCompiler for hadoop execution types, but > Janino by default. > > If you have any concerns, please let me know by Monday; otherwise I'd like > to push this change into our upcoming 0.14 release. > > > Regards, > Matthias > > > > >