I also agree that internal data structures shouldn't be exposed to a user. However, I think we definitely need to keep the `Matrix` and `Frame` types in the API, in agreement with Arvind. The main purpose of SystemML for a user is to allow for machine learning algorithms involving matrices to be run on a given system (laptop, Spark cluster, etc.). Anything involving a compilation chain directly is noise for our ML users. Thus it's quite useful for SystemML to expose a `Matrix` type with a limited API as is currently done in MLContext. This allows a user to interact with SystemML via these `Matrix` objects which abstractly represent the core data structure of a SystemML script. Furthermore, these Matrix objects can be used as subsequent input to an additional script, or can be converted to a DataFrame once the user is ready to continue interacting with Spark. As Arvind mentioned, this just allows the DML `Matrix` type to be effectively exposed at the API level as well. Additionally, we plan to unify this `Matrix` type with the lazy matrix types we are creating in the Python and Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in DML. The similar argument exists for `Frame` as well.
I think that limiting the exposure of internal structures to users could be useful, but removing `Matrix` & `Frame` and instead having a user deal directly with compilation chains would be a step backwards. - Mike -- Michael W. Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry On Sun, Sep 11, 2016 at 5:52 PM, Acs S <[email protected]> wrote: > Yes, I agree that we should NOT expose any internal objects at API > level.Objects like FrameObject, MatrixObject should not be exposed as those > are internal objects. > Rule of thumb should be if object (Frame, Object or Scalar) is exposed at > DML level it should be exposed at MlContext level.If there is need to > add anything extra object besides being exposed in DML it should be > justifiable with rationale. > I have introduced FrameObject as oversight. It should have been private > method instead of public method. I can fix it soon. But there are more > changes you have proposed I will let Deron to respond. > Thanks for catching these issues. > -Arvind > > From: Matthias Boehm <[email protected]> > To: dev <[email protected]> > Sent: Sunday, September 11, 2016 9:43 AM > Subject: Simplification of MLContext and related APIs > > > > It's great to see the ongoing progress on MLContext and related APIs. > However, one aspect that really concerns me is the creation of many > redundant data types and exposition of various internal data structures. > For example, exposing MatrixObject and FrameObject at API level is > dangerous because it makes external programs data-dependent on internal > structures that might be subject to change (no API stability) and users > might not be aware of the implications their interactions have on the > buffer pool etc. Furthermore, having such a plethora of entry points makes > it very hard to ensure consistency of the compilation chain with regard to > configuration handling, environment setup and advanced compilation > techniques. > > I would recommend to create a holistic design across the various APIs that > aims to (1) reduce the number of exposed data types (for instance, I would > like to remove MatrixObject/FrameObject from the external interface, as > well as remove BinaryBlockMatrix, BinaryBlockFrame, Matrix, Frame, and > related meta data objects), and (2) create a configurable compilation chain > that is invoked from all external APIs. I understand that these data types > were introduced to simplify, for example, imports in user programs but I'm > sure we find an alternative realization with less redundancy. What do you > think? > > Regards, > Matthias > > >
