Luciano Resende <[email protected]> wrote on 04/02/2016 10:20:06 PM:
> From: Luciano Resende <[email protected]> > To: [email protected] > Cc: Deron Eriksson/San Francisco/IBM@IBMUS > Date: 04/02/2016 10:20 PM > Subject: Re: Discussion SYSTEMML-593 MLContext Resign > > On Sat, Apr 2, 2016 at 9:34 PM, Matthias Boehm <[email protected]> wrote: > > > > > > > thanks Deron for initiating the discussion around the rework of our > > MLContext API (https://issues.apache.org/jira/browse/SYSTEMML-593). Here > > are a couple of thoughts: > > > > (1) Simplicity: Given that the primary usecase of MLContext calls a script > > exactly once, I'm wondering if the separation into Script, ScriptFactory, > > ScriptExecutor and MLContext adds unnecessary complexity by requiring more > > code to setup. It would be great to see old vs new examples side by side. > > Also rather than introducing another exception class, couldn't we just > > reuse DMLException by making it an uncaught exception? > > > > Simplicity and easy of use will, in my view, dictate use adoption. One way > I usually do to accomplish this is to start by building an app or test > scenario and use that to model the user experience when using the APIs. > This might help identify how many steps the user needs to handle before > actually using the api (e.g. Script, ScriptFactory, etc) and if those steps > are really necessary... But, to be successful with this approach, you > really need to come with a clean mind, and really think as a user trying to > use the API. > > Having said that, sometimes we do need a fine grained api, but sometimes > that might be used internally (private apis) and be exposed to the user > with a more coarse grained api that hides much of the details and expose a > simple programming model to the user. > > Also, I like the option to have a fluent API, which makes the code more > readable and easy to use. I think the proposed approach moves in the direction of greater cleanliness, doesn't add complexity, and makes some tasks less complex. For example, it's easier to write an application that calls out to multiple DML scripts if you have an abstraction for a "prepared" script, a la a prepared SQL statement. Regarding Matthias's question about old vs new examples of the "run a single script once use" case, here you go: OLD: String str = "x=$X; A=read($Ain); B=A+x; write(B, 'temp');"; ml.reset(); ml.registerInput("X", 10); ml.registerInput("Ain", csv, sc.textFile("m.csv")); ml.registerOutput("B"); MLOutput output = ml.execute(str); MLMatrix matrix = output.getMLMatrix(ml, sqlContext, "B"); NEW: String str = "x=$X; A=read($Ain); B=A+x; write(B, 'temp');"; Script script = ScriptFactory.dml(str); script.in("$X", 10).in("A", sc.textFile("m.csv")).regOut("B"); ml.execute(script); BinaryBlockMatrix matrix = script.out("B"); The new version has about the same amount of code. One question I have about the design, though: What happens to outputs of the most recent run when someone runs a new script? Are we leaking persisted RDDs or unpersisting RDDs that the calling application might have a pointer to? > > https://en.wikipedia.org/wiki/Fluent_interface#Java > > > > > > (2) Compilation: ScriptExecutor would also be yet another replicate of our > > compilation chain (beside DMLScript, Debugger, JMLC, MLContext). Please, > > keep in mind that we are about to consolidate this, centralizing these > > calls via a configurable compilation chain because it really becomes a > > maintenance nightmare (as recently seen when we reworked our thread-local > > configuration management). Agreed, we should include the necessary refactoring to centralize the top-level control of the compilation chain into the work proposed in this design doc. > > > > (3) Data Representation / Converters: Making data conversions and > > input/output handling easier is certainly useful. However, isn't this new > > class hierarchy redundant to our already existing hierarchy of Data, > > MatrixObject, FrameObject, and ScalarObject? > > > > Is there a way to hide some of the input/output registration calls by some > convention over api calls ? > > > > > > (4) Consolidating MLContext and JMLC: This is a good idea since MLContext > > is anyway "derived" from JMLC and both rely on the same concepts for > > input/output handling and program modification (see JMLCUtils). Down the > > road I would like to see a convergence to something like Script and > > PreparedScript in the spirit of JDBC's Statement and PreparedStatement > > (btw, that's how we created JMLC). Let's keep in mind that there are > > already a number of users working against both MLContext and JMLC, so we > > should support them separately until our major 1.0 release. > > > > > > Regards, > > Matthias > > > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
