Thank you both for your thoughtful comments. Agreed: we should not force parity; rather, we should make sure that SystemDS built-in functions "cover" important use cases. I will start with an audit of SystemDS's existing capabilities and create a PR on systemds-website <https://github.com/apache/systemds-website> with my findings. This would also be a good way to identify gaps in the documentation for existing builtins so we can update it.
Thanks, Badrul On Tue, 2 Aug 2022 at 06:12, arnab phani <phaniar...@gmail.com> wrote: > In my understanding, parity matters if 1) frameworks share a similar user > base and use cases (sklearn, pandas, etc.) > or 2) one framework shares APIs with another (dask, modin, pandas). > Otherwise, forcing parity can be counterproductive. During our work on > feature transformations, > we have seen major differences in supported feature transformations, user > APIs, and configurations among ML Systems. > For instance, TensorFlow tunes its APIs based on the expected use cases > (neural network) and data > characteristics (text, image), while sklearn aims for traditional ML jobs. > Moreover, some API changes are > required to be able to use certain underlying optimizations. > Having said that, It is definitely important to support popular builtins, > however, I don't think it is necessary to > use the same names, APIs, and flags. I liked the idea of writing our > documentation in a way that helps new users to draw > similarities with popular libraries. A capability matrix to map builtins > from other systems to ours can be helpful. > > Regards, > Arnab.. > > On Tue, Aug 2, 2022 at 6:16 AM Janardhan <janard...@apache.org> wrote: > > > Hi Badrul, > > > > Adding to this discussion, > > I think we can start with what we already have implemented. We do not > > need to implement every last function, we can choose a use-case based > > approach for best results. I would start with the present status of > > the builtins - they are enough for a lot of use cases! then implement > > one by one based on priority. Most of our builtin functions other than > > ML (including NN library) are inspired from R language. > > > > During the implementation/testing, we might need to modify/could find > > optimization opportunities for our system internals. > > > > One of the approaches: > > 1. Take an algorithm/product that is already implemented in another > > system/library. > > 2. Find places where SystemDS can perform better. Find the low hanging > > fruit, like can we use one of our python builtins or a combination to > > achieve similar or better results. and can we improve it further. > > 3. So, we identified a candidate for builtin. > > 4. and repeat the cycle. > > > > > > Best regards, > > Janardhan > > > > > > > > On Tue, Aug 2, 2022 at 2:09 AM Badrul Chowdhury > > <badrulchowdhur...@gmail.com> wrote: > > > > > > Hi, > > > > > > I wanted to start a discussion on building parity of built-in functions > > > with popular OSS libraries. I am thinking of attaining parity as a > 3-step > > > process: > > > > > > *Step 1* > > > As far as I can tell from the existing built-in functions, SystemDS > aims > > to > > > offer a hybrid set of APIs for scientific computing and ML (data > > > engineering included) to users. Therefore, the most obvious OSS > libraries > > > for comparison would be numpy, sklearn (scipy), and pandas. Apache > > > DataSketches would be another relevant system for specialized use cases > > > (sketches). > > > > > > *Step 2* > > > Once we have established a set of libraries, I would propose that we > > create > > > a capability matrix with sections for each library, like so: > > > > > > Section 1: numpy > > > > > > f_1 > > > > > > f_2 > > > > > > [..] > > > > > > > > > f_n > > > > > > Section 2: sklearn > > > > > > [..] > > > > > > > > > The columns could be a checklist like this: f_i -> (DML, Python, CP, > SP, > > > RowCol, Row, Col, Federated, documentationPublished) > > > > > > *Step 3* > > > Create JIRA tasks, assign them, and start coding. > > > > > > > > > Thoughts? > > > > > > > > > Thanks, > > > Badrul > > > -- Cheers, Badrul