Hi Arnab, Here is my list, feel free to remove elements 😊
Major: - Refactor Compression package and add functions - add Quanization for lossy compression - Generalize column groups to use same base dictionary - Binary cell operations - Left Matrix Multiplication - GitHub actions for automated testing - Improved Compile times, and packaging - Docker containers for systemds, pythonsystemds and testingsystemds Minor: - python PCA and MultiLogReg algorithms - parallel sort - parallel detect schema - Url handler for federated - Distinct values count / estimation function - Simplified Log4J from being Hadoop based to our own - Handle NaStrings in CSV reading frame and matrix - Re-enable code coverage tools Removed - GitHub pages, for documentation and moved to master - Travis testing Best regards Sebastian ________________________________ From: arnab phani <phaniar...@gmail.com> Sent: Monday, September 7, 2020 9:26:12 AM To: dev@systemds.apache.org Subject: Re: [DISCUSS] Apache SystemDS 2.0 Release Thanks Kevin. Other committers: once you get a chance, please send me your contributions too. Regards, Arnab.. On Wed, Sep 2, 2020 at 10:04 PM Kevin Innerebner < innereb...@student.tugraz.at> wrote: > Hi, > > here are the changes I contributed after March 24: > > - Added SystemDSContext to python api (now necessary for operations) > > - Added federated frames > > - Federated transform-encode, -decode and -apply (missing value > imputation is still an ongoing PR, I think it will be merged in before > release) > > - New builtin `colnames()` to get the column names of a frame > > That should be everything from my side. > > Regards, > Kevin > > On 9/1/20 11:36 AM, arnab phani wrote: > > Hi All, > > > > As we are nearing the release, I am starting to focus on the release > notes. > > Notes for SystemDS 2.0 release should consolidate all the things that > > happened since Aug 2018 (last SystemML release). > > While I will aggregate the notes from two SystemDS releases, it will be > > great if you can update me with a few lines summarizing the additions to > > your features (including the external contributions), especially after > > March 24, 2020 (last SystemDS release). > > > > Once ready, I will share for everyone to have a look. > > > > Regards, > > Arnab.. > > > > On Mon, Aug 31, 2020 at 8:34 PM Matthias Boehm <mboe...@gmail.com> > wrote: > > > >> thanks Arnab for looking over the remaining open issues. Together with > >> Shafaq, we just came across two additional bugs related to eval function > >> calls. Theses fixes should go into the RC and I intend to fix them as > >> soon as possible. > >> > >> Regards, > >> Matthias > >> > >> On 8/27/2020 8:41 PM, arnab phani wrote: > >>> Hi All, > >>> > >>> Currently, I see only a few issues are flagged for 2.0 release. Can you > >>> please go through your open issues and check if the Fix-Version is set? > >>> Also, if a JIRA task doesn't exist for something you are working on or > >> want > >>> to have in the coming release, please open a task and flag it for 2.0. > >>> > >>> Regards, > >>> Arnab.. > >>> > >>> On Thu, Aug 20, 2020 at 8:18 PM Matthias Boehm <mboe...@gmail.com> > >> wrote: > >>>> as the target release date end of August comes closer, I'd like to > share > >>>> that Arnab Phani kindly volunteered in an offline discussion to act as > >>>> the release manager for our 2.0 release. > >>>> > >>>> Please, flag issues and features you think are important for the 2.0 > >>>> release as such in JIRA so we can monitor them, discuss them on a case > >>>> by case basis, and push the release date if necessary. Thanks. > >>>> > >>>> Regards, > >>>> Matthias > >>>> > >>>> On 8/17/2020 2:51 PM, Janardhan wrote: > >>>>> Hi, > >>>>> > >>>>> The following is the status of the MLContext test for algorithms. > >>>>> > >>>>> 1. l2svm, msvm, PCA - scripts are running + results are not equal to > R > >>>>> 2. Autoencoder, StepwiseReg - Scripts are not running > >>>>> 3. KMeans, GLM (need to fix R) - No R script > >>>>> > >>>>> Thank you, > >>>>> Janardhan > >>>>> > >>>>> On Fri, Jul 10, 2020 at 2:29 AM Matthias Boehm <mboe...@gmail.com> > >>>> wrote: > >>>>>> thanks for the perspective, I think we should be very pragmatic > >>>>>> regarding languages. Let's stick to DML as our domain-specific > >> language > >>>>>> with R-like syntax, but add language bindings such as the Python API > >>>>>> (and others) to seamlessly plug into common data science workflows. > A > >>>>>> similar mind set worked very well in the internals too: Java for > >> nicely > >>>>>> integrating with Hadoop/Spark and simplicity, but with C++ and CUDA > >>>>>> kernels and native libraries where necessary. > >>>>>> > >>>>>> Regards, > >>>>>> Matthias > >>>>>> > >>>>>> On 7/9/2020 3:54 PM, Janardhan wrote: > >>>>>>> DML - %*% seems more Intuitive compared to @. Let us not change the > >>>>>> syntax > >>>>>>> ( our selling point easy porting to R! ) > >>>>>>> Python - no solid opinion > >>>>>>> > >>>>>>> - Janardhan > >>>>>>> > >>>>>>> On Thu, 9 Jul, 2020, 19:06 Matthias Boehm, <mboe...@gmail.com> > >> wrote: > >>>>>>>> for the Python API this is fine, for DML not as we should stick as > >>>> close > >>>>>>>> as possible to R syntax. Once we had a pydml syntax too, but this > >>>>>>>> created lots of inconsistencies and could not use Python as a host > >>>>>>>> language. So, I think restricting such changes to the Python API > is > >> a > >>>>>>>> good path forward. Other opinions? > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Matthias > >>>>>>>> > >>>>>>>> On 7/9/2020 3:31 PM, Baunsgaard, Sebastian wrote: > >>>>>>>>> Hi all > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Can i suggest a radical change of matrix multiply. > >>>>>>>>> to change the command from %*% to @. > >>>>>>>>> > >>>>>>>>> Python has made this commitment! > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> https://www.python.org/dev/peps/pep-0465/ > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> or at least change this in the python API? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Best regards > >>>>>>>>> > >>>>>>>>> Sebastian > >>>>>>>>> > >>>>>>>>> ________________________________ > >>>>>>>>> From: Matthias Boehm <mboe...@gmail.com> > >>>>>>>>> Sent: Wednesday, July 8, 2020 11:04:12 PM > >>>>>>>>> To: dev@systemds.apache.org > >>>>>>>>> Subject: [DISCUSS] Apache SystemDS 2.0 Release > >>>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I'd like to propose Aug 31 as a target date for the SystemDS 2.0 > >>>>>> release > >>>>>>>>> (feature freeze August 21). This should gives us enough time to > >>>> figure > >>>>>>>>> out the list of things that still should go into this release as > >> it's > >>>>>> an > >>>>>>>>> opportunity of a major for changes of external behavior. However, > >> as > >>>>>>>>> it's the first SystemDS Apache release, I think we should still > >> stick > >>>>>> to > >>>>>>>>> Spark 2.x and Java 8 and consider upgrades of Spark and the JDK > for > >>>>>>>>> subsequent releases. So, what do you think and any major features > >>>> you'd > >>>>>>>>> like to see complete for 2.0? > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Matthias > >>>>>>>>> >