thanks for the context Jeremy - that helps. I also had an offline conversion with Sasha and he pointed me to a script that does exactly that (iterative invert_lower_triangular) combined with a parfor over independent blocks. We'll merge these scripts soon and I'll reach out individually as necessary. Thanks everybody for now.
Regards, Matthias On Sun, Apr 22, 2018 at 12:40 PM, Jeremy Nilmeier <nilme...@us.ibm.com> wrote: > This may be a duplicate...it was bounced from the dev list. > > I think that scalable triangular inverse will also have similar properties, > in that there is a sequential approach if it uses back substitution. > > For most of these algorithms (LU, Cholesky, QR), they are inherently > sequential, and the focus of the work is on minimizing interprocess > communication during the operations, which may explain why there was only > limited interest in pursuing this further. > > I had originally recommended that the recursive algorithms be rewritten as > iterative algorithms (and in fact provided an example of the LU in iterative > form), which would make the counting of operations more transparent, as well > as revealing possible parallelization points. > > Cheers, J > Jerome Nilmeier, PhD > Data Scientist and Engineer > IBM Spark Technology Center > http://www.spark.tc/ > > > > ----- Original message ----- > From: Matthias Boehm <mboe...@gmail.com> > To: dev@systemml.apache.org > Cc: Qifan Pu <qifan...@gmail.com> > Subject: Re: distributed cholesky on systemml > Date: Sun, Apr 22, 2018 1:21 AM > > sure no problem - thanks again for catching this issue that was hidden > for a while. > > Yes, the same depth-first characteristic applies to the Cholesky > function as well. In contrast to U_triangular_inv, however, there are > data dependencies between the blocks per level (at least in the > current algorithm formulation), which means we cannot use the approach > I described for U_triangular_inv. > > L11 = Cholesky(A11, nb) > A22 = ... U_triangular_inv(t(L11)) > L22 = Cholesky(A22, nb) > > However, note that there are much fewer calls to Cholesky due to the > switch to the builtin cholesky according to the given min block size. > For example, in our new test for dimensions 1362 x 1362 and min size > of 200, we call Cholesky 15 times but U_triangular_inv 2539 times. > > For sufficiently large min block size this might be ok for Cholesky, > because each level also does a number of matrix multiplies that will > exploit the available parallelism of your cluster. In that regard. you > might want to experiment with different block sizes and driver memory > budgets. If I get a chance, I will also run a number of experiments > and see if we can rewrite these scripts. > > Regards, > Matthias > > On Sun, Apr 22, 2018 at 12:48 AM, Qifan Pu <qifan...@gmail.com> wrote: >> Matthias, >> >> Thanks so much for taking time to fix. Really appreciated it. >> Does the same reasoning apply to the cholesky script? The recursive >> approach >> also looks inherently sequential. >> >> Best, >> Qifan >> >> On Sat, Apr 21, 2018 at 11:39 PM, Matthias Boehm <mboe...@gmail.com> >> wrote: >>> >>> just as a quick update: this issue has now been fixed in SystemML >>> master - it was essentially a missing guard for recursive functions >>> when checking for unary size-preserving functions during >>> inter-procedural analysis (IPA). >>> >>> However, while working with this recursive cholesky function I came to >>> the conclusion that it may need some rework. The current top-down, >>> depth-first, approach is inherently sequential. This is partially >>> unnecessary because for the used recursive function U_triangular_inv >>> (which is called many more times than cholesky), blocks per level are >>> independent. Therefore, we should look into a bottom-up, breadth-first >>> approach to parallelize over the blocks in each level, which could be >>> done via parfor at script level. >>> >>> Regards, >>> Matthias >>> >>> On Sat, Apr 21, 2018 at 6:59 PM, Matthias Boehm <mboe...@gmail.com> >>> wrote: >>> > thanks for catching this - I just ran a toy example and this seems to >>> > be a rewrite issue (there are specific right indexing rewrites that >>> > collapse U[1:k,1:k] and U[1:k,k+1:n] into a single access to U which >>> > helps for large distributed matrices). As a workaround, you can set >>> > "sysml.optlevel" to 1 (instead of default 2, where 1 disables all >>> > rewrites), which worked fine for me. I'll fix this later today. Also >>> > I'll fix the naming from "Choleskey" to "Cholesky". Thanks again. >>> > >>> > Regards, >>> > Matthias >>> > >>> > >>> > On Sat, Apr 21, 2018 at 6:28 PM, Qifan Pu <qifan...@gmail.com> wrote: >>> >> Hi Matthias, >>> >> >>> >> Thanks for the fast response and detailed information. This is really >>> >> helpful. >>> >> >>> >> I just tried to run it, and was tracing down a indexing bug that can >>> >> be >>> >> repeated by simply running the test script of triangle solve[1] >>> >> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid >>> >> values >>> >> for >>> >> matrix indexing: [1667:3333,1:1666] must be within matrix dimensions >>> >> [1000,1000] >>> >> >>> >> >>> >> Am I missing some configuration here? >>> >> >>> >> >>> >> [1] >>> >> >>> >> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_systemml_blob_master_scripts_staging_scalable-5Flinalg_test_test-5Ftriangular-5Finv.dml&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv2pWvLSi1psso4F9RdGjEWL6yc&m=FvqDr_AKzY5EAD_GAXIJoot0Z09NtMUt8kLShXcJxqQ&s=zIEgt74yeZzCTqvLCgV_0J8ECApG541uUlbaGMcK8bs&e= >>> >> >>> >> >>> >> Best, >>> >> Qifan >>> >> >>> >> >>> >> On Sat, Apr 21, 2018 at 4:06 PM, Matthias Boehm <mboe...@gmail.com> >>> >> wrote: >>> >>> >>> >>> Hi Qifan, >>> >>> >>> >>> thanks for your feedback. You're right, the builtin functions >>> >>> cholesky, inverse, eigen, solve, svd, qr, and lu are currently only >>> >>> supported as single-node operations because they're still implemented >>> >>> via Apache commons.math. >>> >>> >>> >>> However, there is an experimental script for distributed cholesky [1] >>> >>> which uses a recursive approach (with operations that allow for >>> >>> automatic distributed computation) for matrices larger than a >>> >>> user-defined block size. Once blocks become small enough, we use >>> >>> again >>> >>> the builtin cholesky. Graduating this script would require a broader >>> >>> set of experiments (and potential improvements) but it simply did not >>> >>> have the highest priority so far. You might want to give it a try >>> >>> though. >>> >>> >>> >>> Thanks again for your feedback - we'll consider a higher priority for >>> >>> these distributed operations when discussing the roadmap for the next >>> >>> releases. >>> >>> >>> >>> [1] >>> >>> >>> >>> >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_systemml_blob_master_scripts_staging_scalable-5Flinalg_cholesky.dml&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv2pWvLSi1psso4F9RdGjEWL6yc&m=FvqDr_AKzY5EAD_GAXIJoot0Z09NtMUt8kLShXcJxqQ&s=Yrj4GGcTlpZGRw34RoON_oO6-xDUtiIEUcO7-qIOyoc&e= >>> >>> >>> >>> Regards, >>> >>> Matthias >>> >>> >>> >>> On Sat, Apr 21, 2018 at 2:15 PM, Qifan Pu <qifan...@gmail.com> wrote: >>> >>> > Hi, >>> >>> > >>> >>> > I would love to do distributed cholesky on large matrix with >>> >>> > SystemML. I >>> >>> > found two related jiras (SYSTEMML-1213, SYSTEMML-1163), but AFAIK, >>> >>> > this >>> >>> > is >>> >>> > currently not implemented? I just wanted to check. >>> >>> > >>> >>> > Best, >>> >>> > Qifan >>> >> >>> >> >> >> > > > >