Thanks, Imran. I think it is a good idea to start off with the DML-bodied
function implementation. This will hold until we can have a built in
implementation.
We prototyped an implementation of distributed Cholesky as a DML bodied
function as well. For performance optimization, as the matrix becomes
"small" enough, we switched over and exploit a single node
Adding a new svd() built in function that initially routes to a local
library is fine. I don't know whether Apache commons math has an
implementation that can be re-used.
I object renaming the functions or changing the externals. Eventually
distributed instructions need to be added to these implementations, and
there are open jiras for it.
Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com
From: Niketan Pansare/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date: 10/21/2016 01:14 PM
Subject: Re: Local versions of Linear Algebra Operators in DML
I am also comfortable with option (2) ... "with a plan to implement its
distributed version"
Thanks,
Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out
before starting work on this. Actually, the introduction of these CP-
From: Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apache.org
Date: 10/21/2016 01:00 PM
Subject: Re: Local versions of Linear Algebra Operators in DML
thanks Nakul for reaching out before starting work on this. Actually,
the introduction of these CP-only builtin functions was a big mistake
because (as you already mentioned) they mistakenly suggest that we
provide distributed operations for them too. The intend was to support
them in later versions with our own local and distributed
implementations. So far, this had low priority though because these
O(n^3) operations are seldom used over large data. However, a while
back, we lost potential users who were specifically interested in
distributed eigen - so there are still use cases.
Despite the good intentions behind the renaming, I would strongly argue
against it. First, it would unnecessarily lose compatibility with R
syntax. Second, it would defeat our clean abstraction by exposing
explicit local operations.
This leaves us with two options here: (1) you could use an external
(java-implemented) function, which gives you virtually the same runtime
behavior but a clear separation via an explicit registration, or (2) add
it to the list of CP-only operations (with a plan to implement its
distributed version) but name it 'svd' as in R.
Regards,
Matthias
On 10/21/2016 9:34 PM, Nakul Jindal wrote:
Hi,
Imran was planning on implementing a distributed SVD as a DML bodied
function.
The algorithm is described in the paper titled "A Distributed and
Incremental SVD Algorithm for Agglomerative Data Analysis on Large
Networks" available at https://arxiv.org/abs/1601.07010.
This algorithm requires the availability of a local SVD function, which
we
currently do not have in SystemML.
Seeing as how there are other linear algebra functions (eigen, lu, qr,
cholesky) in DML that reroute to Apache Common Math and only operate in
standalone/CP mode, would it be ok to add "svd" to this set?
Also, since these operations are local and not distributed and the
documentation doesn't make it clear that these operations wont operate
in
distributed mode, would it make sense to rename them to "local_eigen",
"local_qr", "local_cholesky", etc?
Obviously, this change would go into the version after 0.11.
I understand that the ideal solution to this problem is to have a
distributed version of the aforementioned linear algebra routines, but
for
the time being, would it be ok to go ahead do the rename, while also
introducing a "local_svd" ?
Niketan, Berthold, Matthias, Sasha - Any thoughts?
Thanks,
Nakul Jindal