Re: Roadmap Merge and Rename SystemDS

Matthias Boehm Fri, 10 Apr 2020 05:18:46 -0700

yes, all that will be covered, but there official processes to follow:

ad 1) there yesterday, the podling for the suitable name search has beenapproved with additional comments to the PMC.

https://issues.apache.org/jira/projects/PODLINGNAMESEARCH/issues/PODLINGNAMESEARCH-179?filter=allissues

ad 2) we follow the official new committer process, and there are stilladditional steps to do

http://community.apache.org/newcommitter.html

ad 3) It's still not decided yet, if we go directly to 2.0 in order toavoid tag conflicts or 0.3. Feel free to express your opinion here too.


Regards,
Matthias

On 4/10/2020 4:39 AM, Janardhan wrote:

Hi Matthias,

    Would you be so kind as to announce the following:
1.  Apache Infra jira ticket for name change
2. new committers (welcome!) and of course contributors.
3. New release version number (is it SYSTEMDS-0.3.0-SNAPSHOT)

Thank you,
Janardhan

On Tue, Mar 24, 2020 at 6:28 PM Matthias Boehm <mboe...@gmail.com> wrote:

that's a good point Henry. Yes, with SystemDS 0.1.0, we removed the
MapReduce compiler and runtime backend, the pydml parser and language
support, the Java-UDF framework, and the script-level debugger. We are
concentrating on local, spark, GPU, and federated backends now, added
new language bindings including an initial Python binding. However, the
script-level operation support remains intact and is even largely
extended by builtins for algorithms, data cleaning, and debugging.

Accordingly, it might be good to deprecate the removed things while
merging the code in and then make the next Apache SystemDS (pending
approval) release a major release which allows us to break external APIs.

Regards,
Matthias

On 3/24/2020 2:07 AM, Henry Saputra wrote:

Thanks for starting this discussions, Matthias.

Are there any features from SystemML that could be be removed or

deprecated

when SystemDS being merged to SystemML repository?

- Henry

On Sat, Mar 21, 2020 at 2:47 PM Matthias Boehm <mboe...@gmail.com>

wrote:

just FYI, we created a ticket for the suitable name search, and shared
the related results [1]. So from my perspective, it really boils down to
the question if we accept the closeness to 'Linux systemd'. Back in 2018
(when starting SystemDS), I came to the conclusion that it's fine
because of the very different objectives and because SystemDS reflects
both the origin from SystemML and its new focus on data science

pipelines.

[1]

https://issues.apache.org/jira/projects/PODLINGNAMESEARCH/issues/PODLINGNAMESEARCH-179?filter=allissues


Regards,
Matthias

On 3/9/2020 6:37 PM, Matthias Boehm wrote:

Hi all,

as you're probably aware, development activities of Apache SystemML
significantly slowed down and were virtually non-existing in the last
year for various reasons. Part of that was that my team and I [1]
decided to start SystemDS [2,3] as a fork of SystemML in 09/2018 with a
new vision and roadmap for the future.

During PMC discussions regarding the retirement of SystemML, we came to
the conclusions that the best path forward -- for the entire community
-- would be to merge SystemDS back into Apache SystemML, rename it to
SystemDS, and continue jointly. Before doing so, I want to share the
plan with the entire community.

SystemDS aims at providing better systems support for the end-to-end
data science lifecycle, with a special focus on ML pipelines from data
integration, cleaning, and preparation, over efficient ML model
training, to model debugging and serving. A key observation is that
state-of-the-art data integration and cleaning primitives are

themselves

based on machine learning. Our main objectives are to support effective
and efficient data preparation, ML training and debugging at scale,
something that cannot be composed from existing libraries. The game

plan

includes three major parts:

1) DSL-based, High-level Abstractions: We aim to provide a hierarchy of
abstractions for the different lifecycle tasks as well as users with
different expertise (ML researchers, data scientists, domain experts),
based on our DSL for ML training and scoring. Exploratory data science
interleaves data preparation, ML training, scoring, and debugging in an
iterative process; and once these tasks are expressed in dense or

sparse

linear algebra, we expect very good performance.

2) Hybrid Runtime Plans and Optimizing Compiler: To support the wide
variety of algorithm classes, we will continue to provide different
parallelization strategies, enriched by a new backend for federated ML
and privacy enhancing technologies. Since the hierarchy of language
abstractions inevitably leads to redundancy, we further aim to improve
the automatic optimization capabilities of the compiler and underlying
runtime.

3) Data Model - Heterogeneous Tensors: To support data integration and
cleaning primitives in linear algebra programs requires a more generic
data model for handling heterogeneous and structured data. In contrast
to existing ML systems, our central data model are heterogeneous
tensors. Thus, we generalize SystemML's FP64 matrices to
multi-dimensional arrays where one dimension may have a schema

including

JSON strings to represent nested data.

Admin: We intend to create the SystemDS 0.2 release in March.

Afterwards

we would then rebase all our commits (369) back onto the SystemML
codeline. Subsequently, we will rename Apache SystemML to Apache
SystemDS and continue our development under Apache umbrella. I just

went

through the Apache name search guidelines and we'll perform a 'suitable
name search' accordingly and then transfer SystemDS. The existing PMC
and committer status stays of course intact unless people want to

leave.

Shortly after the merge, I will nominate the four most active
contributors of the last year to become committers. Regarding releases
(and JIRA numbers), it's up for discussion but both, continuing with
SystemML versions (i.e., 1.3) or SystemDS versions (0.3) seem fine to

me.


Roadmap: At technical level, SystemDS will continue to support all
operations and algorithms SystemML provided but significantly extent

the

scope and functionality via the mentioned hierarchy of language
abstractions (in form of builtin functions). However, during the fork

we

already removed old baggage like the MR backend, the scrip-level
debugger, the PyDML frontend and several other things [4]. Major new
internals are native support for lineage tracing and reuse, the data
model of heterogeneous tensors, and a new federated backend.

[1] https://damslab.github.io/
[2] https://github.com/tugraz-isds/systemds
[3] http://cidrdb.org/cidr2020/papers/p22-boehm-cidr20.pdf
[4] https://github.com/tugraz-isds/systemds/releases/tag/v0.1.0

Regards,
Matthias

Re: Roadmap Merge and Rename SystemDS

Reply via email to