Hi Arnab, our team have contributed the following:
1. Thoroughly documented the builtin functions 2. Starter template for working with databricks and colab Thank you, Janardhan On Mon, Sep 7, 2020 at 6:59 PM Shafaq Siddiqi <shafaq.sidd...@tugraz.at> wrote: > Hi Arnab, > > The changes contributed by me are followings, > > Built-ins: > - dropInvalidLength() and dropInvalidType(): frame built-ins for > data cleaning using schema and length information. > - glm(): Generalized Linear Model added as a built-in from our > algorithms. > - imputeFD(): for missing value imputation using robust functional > dependencies. > - Update in an existing built-in MICE (now works on matrices > instead of frames). > - map() for supporting lambda expressions. > - smote(): an ovesampling technique for class imbalance. > - na_locf(): built-in for forward and backward NA filling. > - gmm(): Gaussian mixture model (experimental feature) > > Binary Operations: > - Comparison operations for frame-frame ops. > > Feel free to make any changes you deem necessary. > > Best Regards, > Shafaq Siddiqi > > On 9/7/2020 9:51 AM, Baunsgaard, Sebastian wrote: > > Hi Arnab, > > > > Here is my list, feel free to remove elements 😊 > > > > Major: > > > > - Refactor Compression package and add functions > > - add Quanization for lossy compression > > - Generalize column groups to use same base dictionary > > - Binary cell operations > > - Left Matrix Multiplication > > - GitHub actions for automated testing > > - Improved Compile times, and packaging > > - Docker containers for systemds, pythonsystemds and testingsystemds > > > > Minor: > > > > - python PCA and MultiLogReg algorithms > > - parallel sort > > - parallel detect schema > > - Url handler for federated > > - Distinct values count / estimation function > > - Simplified Log4J from being Hadoop based to our own > > - Handle NaStrings in CSV reading frame and matrix > > - Re-enable code coverage tools > > > > Removed > > > > - GitHub pages, for documentation and moved to master > > - Travis testing > > > > > > Best regards > > > > Sebastian > > > > ________________________________ > > From: arnab phani <phaniar...@gmail.com> > > Sent: Monday, September 7, 2020 9:26:12 AM > > To: dev@systemds.apache.org > > Subject: Re: [DISCUSS] Apache SystemDS 2.0 Release > > > > Thanks Kevin. > > > > Other committers: once you get a chance, please send me your > contributions > > too. > > > > Regards, > > Arnab.. > > > > On Wed, Sep 2, 2020 at 10:04 PM Kevin Innerebner < > > innereb...@student.tugraz.at> wrote: > > > >> Hi, > >> > >> here are the changes I contributed after March 24: > >> > >> - Added SystemDSContext to python api (now necessary for operations) > >> > >> - Added federated frames > >> > >> - Federated transform-encode, -decode and -apply (missing value > >> imputation is still an ongoing PR, I think it will be merged in before > >> release) > >> > >> - New builtin `colnames()` to get the column names of a frame > >> > >> That should be everything from my side. > >> > >> Regards, > >> Kevin > >> > >> On 9/1/20 11:36 AM, arnab phani wrote: > >>> Hi All, > >>> > >>> As we are nearing the release, I am starting to focus on the release > >> notes. > >>> Notes for SystemDS 2.0 release should consolidate all the things that > >>> happened since Aug 2018 (last SystemML release). > >>> While I will aggregate the notes from two SystemDS releases, it will be > >>> great if you can update me with a few lines summarizing the additions > to > >>> your features (including the external contributions), especially after > >>> March 24, 2020 (last SystemDS release). > >>> > >>> Once ready, I will share for everyone to have a look. > >>> > >>> Regards, > >>> Arnab.. > >>> > >>> On Mon, Aug 31, 2020 at 8:34 PM Matthias Boehm <mboe...@gmail.com> > >> wrote: > >>>> thanks Arnab for looking over the remaining open issues. Together with > >>>> Shafaq, we just came across two additional bugs related to eval > function > >>>> calls. Theses fixes should go into the RC and I intend to fix them as > >>>> soon as possible. > >>>> > >>>> Regards, > >>>> Matthias > >>>> > >>>> On 8/27/2020 8:41 PM, arnab phani wrote: > >>>>> Hi All, > >>>>> > >>>>> Currently, I see only a few issues are flagged for 2.0 release. Can > you > >>>>> please go through your open issues and check if the Fix-Version is > set? > >>>>> Also, if a JIRA task doesn't exist for something you are working on > or > >>>> want > >>>>> to have in the coming release, please open a task and flag it for > 2.0. > >>>>> > >>>>> Regards, > >>>>> Arnab.. > >>>>> > >>>>> On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mboe...@gmail.com> > >>>> wrote: > >>>>>> as the target release date end of August comes closer, I'd like to > >> share > >>>>>> that Arnab Phani kindly volunteered in an offline discussion to act > as > >>>>>> the release manager for our 2.0 release. > >>>>>> > >>>>>> Please, flag issues and features you think are important for the 2.0 > >>>>>> release as such in JIRA so we can monitor them, discuss them on a > case > >>>>>> by case basis, and push the release date if necessary. Thanks. > >>>>>> > >>>>>> Regards, > >>>>>> Matthias > >>>>>> > >>>>>> On 8/17/2020 2:51 PM, Janardhan wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> The following is the status of the MLContext test for algorithms. > >>>>>>> > >>>>>>> 1. l2svm, msvm, PCA - scripts are running + results are not equal > to > >> R > >>>>>>> 2. Autoencoder, StepwiseReg - Scripts are not running > >>>>>>> 3. KMeans, GLM (need to fix R) - No R script > >>>>>>> > >>>>>>> Thank you, > >>>>>>> Janardhan > >>>>>>> > >>>>>>> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mboe...@gmail.com> > >>>>>> wrote: > >>>>>>>> thanks for the perspective, I think we should be very pragmatic > >>>>>>>> regarding languages. Let's stick to DML as our domain-specific > >>>> language > >>>>>>>> with R-like syntax, but add language bindings such as the Python > API > >>>>>>>> (and others) to seamlessly plug into common data science > workflows. > >> A > >>>>>>>> similar mind set worked very well in the internals too: Java for > >>>> nicely > >>>>>>>> integrating with Hadoop/Spark and simplicity, but with C++ and > CUDA > >>>>>>>> kernels and native libraries where necessary. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Matthias > >>>>>>>> > >>>>>>>> On 7/9/2020 3:54 PM, Janardhan wrote: > >>>>>>>>> DML - %*% seems more Intuitive compared to @. Let us not change > the > >>>>>>>> syntax > >>>>>>>>> ( our selling point easy porting to R! ) > >>>>>>>>> Python - no solid opinion > >>>>>>>>> > >>>>>>>>> - Janardhan > >>>>>>>>> > >>>>>>>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mboe...@gmail.com> > >>>> wrote: > >>>>>>>>>> for the Python API this is fine, for DML not as we should stick > as > >>>>>> close > >>>>>>>>>> as possible to R syntax. Once we had a pydml syntax too, but > this > >>>>>>>>>> created lots of inconsistencies and could not use Python as a > host > >>>>>>>>>> language. So, I think restricting such changes to the Python API > >> is > >>>> a > >>>>>>>>>> good path forward. Other opinions? > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> Matthias > >>>>>>>>>> > >>>>>>>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote: > >>>>>>>>>>> Hi all > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Can i suggest a radical change of matrix multiply. > >>>>>>>>>>> to change the command from %*% to @. > >>>>>>>>>>> > >>>>>>>>>>> Python has made this commitment! > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> https://www.python.org/dev/peps/pep-0465/ > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> or at least change this in the python API? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Best regards > >>>>>>>>>>> > >>>>>>>>>>> Sebastian > >>>>>>>>>>> > >>>>>>>>>>> ________________________________ > >>>>>>>>>>> From: Matthias Boehm <mboe...@gmail.com> > >>>>>>>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM > >>>>>>>>>>> To: dev@systemds.apache.org > >>>>>>>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release > >>>>>>>>>>> > >>>>>>>>>>> Hi all, > >>>>>>>>>>> > >>>>>>>>>>> I'd like to propose Aug 31 as a target date for the SystemDS > 2.0 > >>>>>>>> release > >>>>>>>>>>> (feature freeze August 21). This should gives us enough time to > >>>>>> figure > >>>>>>>>>>> out the list of things that still should go into this release > as > >>>> it's > >>>>>>>> an > >>>>>>>>>>> opportunity of a major for changes of external behavior. > However, > >>>> as > >>>>>>>>>>> it's the first SystemDS Apache release, I think we should still > >>>> stick > >>>>>>>> to > >>>>>>>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK > >> for > >>>>>>>>>>> subsequent releases. So, what do you think and any major > features > >>>>>> you'd > >>>>>>>>>>> like to see complete for 2.0? > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Matthias > >>>>>>>>>>> >