Re: [DISCUSS] Apache SystemDS 2.0 Release

arnab phani Thu, 10 Sep 2020 02:25:24 -0700

Thank you all for the notes.
Please find the consolidated release notes below, and please let me know if
anything major is missing.


*Release notes for SystemDS 2.0.*

SystemDS 2.0 is the first major release under the new name. This release
contains a major refactoring, a few major features, a large number of
improvements and fixes, and some experimental features to better support
the end-to-end data science lifecycle. In addition to that, this release
also removes several features that are not up to the mark and outdated.

The major changes (compared to SystemML 1.2) include


   - New mechanism for DML-bodied (script-level) builtin functions, and a
   wealth of new built-in functions for data preprocessing including data
   cleaning, augmentation and feature engineering techniques, new ML
   algorithms, and model debugging.
   - Several methods for data cleaning have been implemented including
   multiple imputations with multivariate imputation by chained equations
   (MICE) and other techniques, SMOTE, an oversampling technique for class
   imbalance, forward and backward NA filling, cleaning using schema and
   length information, support for outlier detection using standard deviation
   and inter-quartile range, and functional dependency discovery.
   - A complete framework for lineage tracing and reuse including support
   for loop deduplication, full and partial reuse, compiler assisted reuse,
   several new rewrites to facilitate reuse.
   - New federated runtime backend including support for federated matrices
   and frames, federated builtins (transform-encode, decode etc.).
   - Refactor compression package and add functionalities including
   quantization for lossy compression, binary cell operations, left matrix
   multiplication.
   - New python bindings with supports for several builtins, matrix
   operations, federated tensors, and lineage traces.
   - Cuda implementation of cumulative aggregate operators (cumsum, cumprod
   etc.)
   - New model debugging technique with slice finder.
   - New tensor data model (basic tensors of different value types, data
   tensors with schema) [experimental]
   -  Cloud deployment scripts for AWS and scripts to set up and start
   federated operations.
   -  Performance improvements with parallel sort, gpu cum agg, append
   cbind etc.
   -  Various compiler and runtime improvements including new and improved
   rewrites, reduced Spark context creation, new eval framework, list
   operations, updated native kernel libraries to name a few.
   - New data reader/writer for json frames and support for sql as a data
   source.
   -  Miscellaneous improvements: improved documentation, better testing,
   run/release scripts, improved packaging, Docker container for systemds, bug
   fixes.
   -  Removed MapReduce compiler and runtime backend, pydml parser,
   Java-UDF framework, script-level debugger.


Regards,
Arnab.


On Tue, Sep 8, 2020 at 4:10 AM Mark Dokter <mdok...@know-center.at> wrote:

> On 01.09.20 11:36, arnab phani wrote:
> > While I will aggregate the notes from two SystemDS releases, it will be
> > great if you can update me with a few lines summarizing the additions to
> > your features (including the external contributions), especially after
> > March 24, 2020 (last SystemDS release).
>
> Hi Arnab!
>
> My contributions:
>
> - new run script
> - improve/simplify release scripts
> - various release related things (improve documentation, fix license
> headers, clean up pom.xml, etc)
> - cuda implementation of cumulative aggregate operators (cumsum,
> cumprod, etc)
> - bug fixes here and there
> - maintain native blas support in a working state (now also supporting
> windows)
> - kmeans builtin dml function
> - builtins for image augmentation
>
> Best,
> Mark
>

Re: [DISCUSS] Apache SystemDS 2.0 Release

Reply via email to