thanks for initiating this discussion and there are indeed a couple of
things we need to clean up. Just for the future, please ask before
adding even more to this diversity (I understand you just recently
changed the github summary proactively without such discussion).
ad 1) DML stands for Declarative ML Language and it's design philosophy
is based on a declarative specification in terms of providing data
independence (abstract data types, no hard coding of
dense/sparse/compressed), and implementation-agnostic operations (no
hard-coding of local vs distributed vs federated vs HW accelerator
operations).
ad 2) When merging SystemDS into Apache SytemDS, I changed the JIRA
summary to "Apache SystemDS - An open source ML system for the
end-to-end data science lifecycle" and I still like this best because we
want to have a stable name, independent of trends of underlying
execution models. As a side not I always disliked the phrase "A machine
learning platform optimal for big data" (use of optional, big data
wording). However, this is just my opinion, and I think it's a good
point to discuss this once and for all (for the foreseeable future at
least). Any thoughts?
Regards,
Matthias
On 5/18/2021 4:18 PM, Janardhan wrote:
Hi all,
We are using different descriptions at various places. It would be better
to exemplify each term more clearly. Sorry, If I am asking something
obvious.
1. Which one should we use as the project description?
note: Although, description given in the SystemDS research paper can
be considered - the paper was published before the Merge into SystemML.
2. Also, what is the full form of DML?
a. Declarative machine Learning Language
b. Descriptive Machine Learning Language
c. ..
Research paper [1]:
SystemDS: A Declarative Machine Learning System for the End-to-End Data
Science Lifecycle
GitHub
Apache SystemDS - A versatile system for the end-to-end data science
lifecycle
PyPI
SystemDS is a distributed and declarative machine learning platform.
systemds.apache.org
A machine learning platform optimal for big data
Jira
Apache SystemDS - An open source ML system for the end-to-end data science
lifecycle
---
SystemDS game plan [1] major parts:
1. DSL-based, High-level Abstractions: We aim to provide a hierarchy of
abstractions for the different lifecycle tasks as well as users with
different expertise
2. Hybrid Runtime Plans and Optimizing Compiler: To support the wide
variety of algorithm classes, we will continue to provide different
parallelization strategies, enriched by a new backend for federated ML
and privacy enhancing technologies.
3. Data Model - Heterogeneous Tensors: To support data integration and
cleaning primitives in linear algebra programs requires a more generic
data model for handling heterogeneous and structured data. In contrast to
existing ML systems, our central data models are heterogeneous tensors.
[1] https://arxiv.org/abs/1909.02976
[2] Roadmap discussion - https://s.apache.org/systemds-roadmap
Thank you,
Janardhan