Re: Discussion SYSTEMML-593 MLContext Resign

Frederick R Reiss Thu, 07 Apr 2016 09:27:00 -0700

Luciano Resende <[email protected]> wrote on 04/02/2016 10:20:06 PM:


> From: Luciano Resende <[email protected]>
> To: [email protected]
> Cc: Deron Eriksson/San Francisco/IBM@IBMUS
> Date: 04/02/2016 10:20 PM
> Subject: Re: Discussion SYSTEMML-593 MLContext Resign
>
> On Sat, Apr 2, 2016 at 9:34 PM, Matthias Boehm <[email protected]> wrote:
>
> >
> >
> > thanks Deron for initiating the discussion around the rework of our
> > MLContext API (https://issues.apache.org/jira/browse/SYSTEMML-593).
Here
> > are a couple of thoughts:
> >
> > (1) Simplicity: Given that the primary usecase of MLContext calls a
script
> > exactly once, I'm wondering if the separation into Script,
ScriptFactory,
> > ScriptExecutor and MLContext adds unnecessary complexity by requiring
more
> > code to setup. It would be great to see old vs new examples side by
side.
> > Also rather than introducing another exception class, couldn't we just
> > reuse DMLException by making it an uncaught exception?
> >
>
> Simplicity and easy of use will, in my view, dictate use adoption. One
way
> I usually do to accomplish this is to start by building an app or test
> scenario and use that to model the user experience when using the APIs.
> This might help identify how many steps the user needs to handle before
> actually using the api (e.g. Script, ScriptFactory, etc) and if those
steps
> are really necessary... But, to be successful with this approach, you
> really need to come with a clean mind, and really think as a user trying
to
> use the API.
>
> Having said that, sometimes we do need a fine grained api, but sometimes
> that might be used internally (private apis) and be exposed to the user
> with a more coarse grained api that hides much of the details and expose
a
> simple programming model to the user.
>
> Also, I like the option to have a fluent API, which makes the code more
> readable and easy to use.


I think the proposed approach moves in the direction of greater
cleanliness, doesn't add complexity, and makes some tasks less complex. For
example, it's easier to write an application that calls out to multiple DML
scripts if you have an abstraction for a "prepared" script, a la a prepared
SQL statement.

Regarding Matthias's question about old vs new examples of the "run a
single script once use" case, here you go:

OLD:
String str = "x=$X; A=read($Ain); B=A+x; write(B, 'temp');";
ml.reset();
ml.registerInput("X", 10);
ml.registerInput("Ain", csv, sc.textFile("m.csv"));
ml.registerOutput("B");
MLOutput output = ml.execute(str);
MLMatrix matrix = output.getMLMatrix(ml, sqlContext, "B");


NEW:
String str = "x=$X; A=read($Ain); B=A+x; write(B, 'temp');";
Script script = ScriptFactory.dml(str);
script.in("$X", 10).in("A", sc.textFile("m.csv")).regOut("B");
ml.execute(script);
BinaryBlockMatrix matrix = script.out("B");

The new version has about the same amount of code.

One question I have about the design, though: What happens to outputs of
the most recent run when someone runs a new script? Are we leaking
persisted RDDs or unpersisting RDDs that the calling application might have
a pointer to?

>
> https://en.wikipedia.org/wiki/Fluent_interface#Java
>
>
> >
> > (2) Compilation: ScriptExecutor would also be yet another replicate of
our
> > compilation chain (beside DMLScript, Debugger, JMLC, MLContext).
Please,
> > keep in mind that we are about to consolidate this, centralizing these
> > calls via a configurable compilation chain because it really becomes a
> > maintenance nightmare (as recently seen when we reworked our
thread-local
> > configuration management).

Agreed, we should include the necessary refactoring to centralize the
top-level control of the compilation chain into the work proposed in this
design doc.

> >
> > (3) Data Representation / Converters: Making data conversions and
> > input/output handling easier is certainly useful. However, isn't this
new
> > class hierarchy redundant to our already existing hierarchy of Data,
> > MatrixObject, FrameObject, and ScalarObject?
> >
>
> Is there a way to hide some of the input/output registration calls by
some
> convention over api calls ?
>
>
> >
> > (4) Consolidating MLContext and JMLC: This is a good idea since
MLContext
> > is anyway "derived" from JMLC and both rely on the same concepts for
> > input/output handling and program modification (see JMLCUtils). Down
the
> > road I would like to see a convergence to something like Script and
> > PreparedScript in the spirit of JDBC's Statement and PreparedStatement
> > (btw, that's how we created JMLC). Let's keep in mind that there are
> > already a number of users working against both MLContext and JMLC, so
we
> > should support them separately until our major 1.0 release.
> >
> >
> > Regards,
> > Matthias
> >
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/

Re: Discussion SYSTEMML-593 MLContext Resign

Reply via email to