Using Janino sounds like a great idea.  As for the footprint size for Java-only 
execution modes, it might make sense to do an audit of our current dependencies 
to see if anything can be removed to make up for the additional amount.  Then 
we could just use it in all scenarios without worry.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Mar 31, 2017, at 9:25 PM, Matthias Boehm <mboe...@googlemail.com> wrote:
> 
> that is a good question. Yes, if we want to enable code generation in such
> a scenario it would also need Janino, which increases our footprint by
> roughly 0.6MB.
> 
> Btw, Janino fits much better into such an in-memory deployment because it
> compiles classes in-memory without the need to write class files into a
> local working directory. The same could be done for
> javax.tools.JavaCompiler, but would require to custom in-memory
> JavaFileManager.
> 
> Regards,
> Matthias
> 
> On Fri, Mar 31, 2017 at 9:14 PM, Berthold Reinwald <reinw...@us.ibm.com>
> wrote:
> 
>> Sounds like a good idea.
>> 
>> Wrt codegen, in a pure Java scoring environment w/o Spark and Hadoop, will
>> the dependency on Janino still be there (that question applies to JDK as
>> well), and what is the footprint?
>> 
>> Regards,
>> Berthold Reinwald
>> IBM Almaden Research Center
>> office: (408) 927 2208; T/L: 457 2208
>> e-mail: reinw...@us.ibm.com
>> 
>> 
>> 
>> From:   Matthias Boehm <mboe...@googlemail.com>
>> To:     dev@systemml.incubator.apache.org
>> Date:   03/31/2017 08:17 PM
>> Subject:        Java compiler for code generation
>> 
>> 
>> 
>> Hi all,
>> 
>> currently, our new code generator for operator fusion, uses the
>> programmatic javax.tools.JavaCompiler, which is Java's standard API for
>> compilation. Despite a plan cache that mitigates unnecessary compilation
>> and recompilation overheads, we still see significant end-to-end overhead
>> especially for small input data.
>> 
>> Moving forward, I'd like to switch to Janino
>> (org.codehaus.janino.SimpleCompiler), which is a fast in-memory Java
>> compiler with restricted language support. The advantages are
>> 
>> (1) Reduced compilation overhead: On end-to-end scenarios for L2SVM, GLM,
>> and MLogreg, Janino improved total javac compilation time from 2.039 to
>> 0.195 (14 operators), from 8.134 to 0.411 (82 operators), and from 4.854
>> to
>> 0.283 (46 operators), respectively. At the same time, there was no
>> measurable impact on runtime efficiency, but even slightly reduced JIT
>> compilation overhead.
>> 
>> (2) Removed JDK requirement: Using the standard javax.tools.JavaCompiler
>> requires the existence of a JDK, while Janino only requires a JRE, which
>> means it makes it easier to apply code generation by default.
>> 
>> However, I'm raising this here as Janino would add another explicit
>> dependency (with BSD license). Fortunately, Spark also uses Janino for
>> whole-stage-codegen. So we should be able to mark Janino as provided
>> library. The only issue is a pure Hadoop environment, where we still want
>> to use code generation for CP operations. To simplify the build, I could
>> imagine using the javax.tools.JavaCompiler for hadoop execution types, but
>> Janino by default.
>> 
>> If you have any concerns, please let me know by Monday; otherwise I'd like
>> to push this change into our upcoming 0.14 release.
>> 
>> 
>> Regards,
>> Matthias
>> 
>> 
>> 
>> 
>> 

Reply via email to