Ability to call MLContext via python acts as a starting point for users to
adapt systemds into their pipelines for ML workloads (atleast, if not a
major portion of the pipeline!).
Case for MLContext python wrapper:
{{code}}
#1. Data processing with Spark
import pyspark.sql.functions as F
dataPath = "amazon0601.txt"
X_train = (sc.textFile(dataPath)
.filter(lambda l: not l.startswith("#"))
...)
#2. ML training with systemds
script = dml(pnmf).input(X=X_train, max_iter=100, rank=10).output("W", "H",
"losses")
losses = ml.execute(script).get("losses")
#3. Work with results with spark operations
xy = losses.toDF().sort("__INDEX").rdd.map(lambda r: (r[0], r[1])).collect()
{{code}}
Discussion PR: https://github.com/apache/systemds/pull/1024
<https://github.com/apache/systemds/pull/1024>
Thank you,
Janardhan
On Wed, Aug 12, 2020 at 6:51 PM Matthias Boehm <[email protected]> wrote:
> but just to be clear there is still room for adding a new python
> MLContext too (with similar APIs to our current).
>
> Regards,
> Matthias
>
> On 8/12/2020 2:43 PM, Baunsgaard, Sebastian wrote:
> >
> > Hi Janardhan,
> >
> >
> > The python interface currently does not run with spark, partially
> because we are focusing on federated execution.
> > The future intention is to not put the spark controls inside the python
> API, since it is the java execution that should handle spark.
> >
> > I think this is an upgrade since the handling of spark from both the
> python API and internally in systemds can be confusing.
> >
> >
> > best regards
> >
> > Sebastian
> >
> > ________________________________
> > From: Janardhan <[email protected]>
> > Sent: Wednesday, August 12, 2020 2:22:52 PM
> > To: [email protected]
> > Subject: Re: [QUESTION] About MLContext. Thanks.
> >
> > Hi Sebastian,
> >
> > What is the equivalent of the following snippet to SystemDS Python api?
> >
> >>>> from systemml import MLContext>>> ml = MLContext(spark)
> > Welcome to Apache SystemML!Version 1.0.0-SNAPSHOT
> >
> >
> > I looked at the `l2svm` but do we have a generic object to which we could
> > pass spark.
> >
> > Thank you,
> > Janardhan
> >
> >
> > On Mon, Jul 20, 2020 at 1:47 PM Janardhan <
> [email protected]>
> > wrote:
> >
> >> Hi all,
> >>
> >> We have a bit of a backlog in MLContext test coverage. This one
> >> is open for discussion.
> >>
> >> The preliminary work[1] on reusing existing R scripts for codegen,
> >> for MLContext testing.
> >>
> >> Deciding on at least till release date helps our team to add algorithms
> >> one by one, to the extent of the codegen covered algorithms.
> >>
> >> [1] https://github.com/apache/systemds/pull/997
> > [https://avatars3.githubusercontent.com/u/47359?s=400&v=4]<
> https://github.com/apache/systemds/pull/997>
> >
> > [SYSTEMDS-1863] Full MLContext test for LinearReg by j143 · Pull Request
> #997 · apache/systemds · GitHub<
> https://github.com/apache/systemds/pull/997>
> > github.com
> > Takes advantage of existing R algorithm scripts used for codegen
> testing. This would improve the testing by allowing us to provide all the
> necessary inputs into the script.
> >
> >
> >>
> >> Thank you,
> >> Janardhan
> >>
> >> On Fri, Jul 17, 2020 at 10:07 PM Janardhan <
> [email protected]>
> >> wrote:
> >>
> >>> Hi, SYSTEMDS-2572 the MLContext API is not loading into spark-shell. Of
> >>> course, everything else works fine!
> >>>
> >>> Thank you,
> >>> Janardhan
> >>>
> >>> On Fri, 17 Jul, 2020, 18:19 Matthias Boehm, <[email protected]> wrote:
> >>>
> >>>> yes, these nn scripts are packed into our SystemDS jar and can be used
> >>>> (via imports) in MLContext scripts as well. If there are issues,
> please
> >>>> feel free to file a bug report.
> >>>>
> >>>> Regards,
> >>>> Matthias
> >>>>
> >>>> On 7/17/2020 11:12 AM, Janardhan wrote:
> >>>>> Hi, Is `nn` library being provided with MLContext?
> >>>>>
> >>>>> Thank you,
> >>>>> Janardhan
> >>>>>
> >>>>
> >>>
> >
>