Thank you both for your thoughtful comments. Agreed: we should not force
parity; rather, we should make sure that SystemDS built-in functions
"cover" important use cases. I will start with an audit of SystemDS's
existing capabilities and create a PR on systemds-website
<https://github.com/apache/systemds-website> with my findings. This would
also be a good way to identify gaps in the documentation for existing
builtins so we can update it.

Thanks,
Badrul

On Tue, 2 Aug 2022 at 06:12, arnab phani <phaniar...@gmail.com> wrote:

> In my understanding, parity matters if 1) frameworks share a similar user
> base and use cases (sklearn, pandas, etc.)
> or 2) one framework shares APIs with another (dask, modin, pandas).
> Otherwise, forcing parity can be counterproductive. During our work on
> feature transformations,
> we have seen major differences in supported feature transformations, user
> APIs, and configurations among ML Systems.
> For instance, TensorFlow tunes its APIs based on the expected use cases
> (neural network) and data
> characteristics (text, image), while sklearn aims for traditional ML jobs.
> Moreover, some API changes are
> required to be able to use certain underlying optimizations.
> Having said that, It is definitely important to support popular builtins,
> however, I don't think it is necessary to
> use the same names, APIs, and flags. I liked the idea of writing our
> documentation in a way that helps new users to draw
> similarities with popular libraries. A capability matrix to map builtins
> from other systems to ours can be helpful.
>
> Regards,
> Arnab..
>
> On Tue, Aug 2, 2022 at 6:16 AM Janardhan <janard...@apache.org> wrote:
>
> > Hi Badrul,
> >
> > Adding to this discussion,
> > I think we can start with what we already have implemented. We do not
> > need to implement every last function, we can choose a use-case based
> > approach for best results. I would start with the present status of
> > the builtins - they are enough for a lot of use cases! then implement
> > one by one based on priority. Most of our builtin functions other than
> > ML (including NN library) are inspired from R language.
> >
> > During the implementation/testing, we might need to modify/could find
> > optimization opportunities for our system internals.
> >
> > One of the approaches:
> > 1. Take an algorithm/product that is already implemented in another
> > system/library.
> > 2. Find places where SystemDS can perform better. Find the low hanging
> > fruit, like can we use one of our python builtins or a combination to
> > achieve similar or better results. and can we improve it further.
> > 3. So, we identified a candidate for builtin.
> > 4. and repeat the cycle.
> >
> >
> > Best regards,
> > Janardhan
> >
> >
> >
> > On Tue, Aug 2, 2022 at 2:09 AM Badrul Chowdhury
> > <badrulchowdhur...@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > I wanted to start a discussion on building parity of built-in functions
> > > with popular OSS libraries. I am thinking of attaining parity as a
> 3-step
> > > process:
> > >
> > > *Step 1*
> > > As far as I can tell from the existing built-in functions, SystemDS
> aims
> > to
> > > offer a hybrid set of APIs for scientific computing and ML (data
> > > engineering included) to users. Therefore, the most obvious OSS
> libraries
> > > for comparison would be numpy, sklearn (scipy), and pandas. Apache
> > > DataSketches would be another relevant system for specialized use cases
> > > (sketches).
> > >
> > > *Step 2*
> > > Once we have established a set of libraries, I would propose that we
> > create
> > > a capability matrix with sections for each library, like so:
> > >
> > > Section 1: numpy
> > >
> > > f_1
> > >
> > > f_2
> > >
> > > [..]
> > >
> > >
> > > f_n
> > >
> > > Section 2: sklearn
> > >
> > > [..]
> > >
> > >
> > > The columns could be a checklist like this: f_i -> (DML, Python, CP,
> SP,
> > > RowCol, Row, Col, Federated, documentationPublished)
> > >
> > > *Step 3*
> > > Create JIRA tasks, assign them, and start coding.
> > >
> > >
> > > Thoughts?
> > >
> > >
> > > Thanks,
> > > Badrul
> >
>


-- 

Cheers,
Badrul

Reply via email to