Re: Simplification of MLContext and related APIs

Mike Dusenberry Mon, 12 Sep 2016 13:23:25 -0700

I also agree that internal data structures shouldn't be exposed to a user.
However, I think we definitely need to keep the `Matrix` and `Frame` types
in the API, in agreement with Arvind.  The main purpose of SystemML for a
user is to allow for machine learning algorithms involving matrices to be
run on a given system (laptop, Spark cluster, etc.).  Anything involving a
compilation chain directly is noise for our ML users.  Thus it's quite
useful for SystemML to expose a `Matrix` type with a limited API as is
currently done in MLContext.  This allows a user to interact with SystemML
via these `Matrix` objects which abstractly represent the core data
structure of a SystemML script.  Furthermore, these Matrix objects can be
used as subsequent input to an additional script, or can be converted to a
DataFrame once the user is ready to continue interacting with Spark.  As
Arvind mentioned, this just allows the DML `Matrix` type to be effectively
exposed at the API level as well.  Additionally, we plan to unify this
`Matrix` type with the lazy matrix types we are creating in the Python and
Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in
DML.  The similar argument exists for `Frame` as well.


I think that limiting the exposure of internal structures to users could be
useful, but removing `Matrix` & `Frame` and instead having a user deal
directly with compilation chains would be a step backwards.

- Mike

--

Michael W. Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

On Sun, Sep 11, 2016 at 5:52 PM, Acs S <[email protected]> wrote:

> Yes, I agree that we should NOT expose any internal objects at API
> level.Objects like FrameObject, MatrixObject should not be exposed as those
> are internal objects.
> Rule of thumb should be if object (Frame, Object or Scalar) is exposed at
> DML level it should be exposed at MlContext level.If there is need to
> add anything extra object besides being exposed in DML it should be
> justifiable with rationale.
> I have introduced FrameObject as oversight. It should have been private
> method instead of public method. I can fix it soon. But there are more
> changes you have proposed I will let Deron to respond.
> Thanks for catching these issues.
> -Arvind
>
>       From: Matthias Boehm <[email protected]>
>  To: dev <[email protected]>
>  Sent: Sunday, September 11, 2016 9:43 AM
>  Subject: Simplification of MLContext and related APIs
>
>
>
> It's great to see the ongoing progress on MLContext and related APIs.
> However, one aspect that really concerns me is the creation of many
> redundant data types and exposition of various internal data structures.
> For example, exposing MatrixObject and FrameObject at API level is
> dangerous because it makes external programs data-dependent on internal
> structures that might be subject to change (no API stability) and users
> might not be aware of the implications their interactions have on the
> buffer pool etc. Furthermore, having such a plethora of entry points makes
> it very hard to ensure consistency of the compilation chain with regard to
> configuration handling, environment setup and advanced compilation
> techniques.
>
> I would recommend to create a holistic design across the various APIs that
> aims to (1) reduce the number of exposed data types (for instance, I would
> like to remove MatrixObject/FrameObject from the external interface, as
> well as remove BinaryBlockMatrix, BinaryBlockFrame, Matrix, Frame, and
> related meta data objects), and (2) create a configurable compilation chain
> that is invoked from all external APIs. I understand that these data types
> were introduced to simplify, for example, imports in user programs but I'm
> sure we find an alternative realization with less redundancy. What do you
> think?
>
> Regards,
> Matthias
>
>
>

Re: Simplification of MLContext and related APIs

Reply via email to