Re: distributed cholesky on systemml

Matthias Boehm Sun, 22 Apr 2018 14:41:19 -0700

thanks for the context Jeremy - that helps. I also had an offline
conversion with Sasha and he pointed me to a script that does exactly
that (iterative invert_lower_triangular) combined with a parfor over
independent blocks. We'll merge these scripts soon and I'll reach out
individually as necessary. Thanks everybody for now.


Regards,
Matthias

On Sun, Apr 22, 2018 at 12:40 PM, Jeremy Nilmeier <[email protected]> wrote:
> This may be a duplicate...it was bounced from the dev list.
>
> I think that scalable triangular inverse will also have similar properties,
> in that there is a sequential approach if it uses back substitution.
>
> For most of these algorithms (LU, Cholesky, QR), they are inherently
> sequential, and the focus of the work is on minimizing interprocess
> communication during the operations, which may explain why there was only
> limited interest in pursuing this further.
>
> I had originally recommended that the recursive algorithms be rewritten as
> iterative algorithms (and in fact provided an example of the LU in iterative
> form), which would make the counting of operations more transparent, as well
> as revealing possible parallelization points.
>
> Cheers, J
> Jerome Nilmeier, PhD
> Data Scientist and Engineer
> IBM Spark Technology Center
> http://www.spark.tc/
>
>
>
> ----- Original message -----
> From: Matthias Boehm <[email protected]>
> To: [email protected]
> Cc: Qifan Pu <[email protected]>
> Subject: Re: distributed cholesky on systemml
> Date: Sun, Apr 22, 2018 1:21 AM
>
> sure no problem - thanks again for catching this issue that was hidden
> for a while.
>
> Yes, the same depth-first characteristic applies to the Cholesky
> function as well. In contrast to U_triangular_inv, however, there are
> data dependencies between the blocks per level (at least in the
> current algorithm formulation), which means we cannot use the approach
> I described for U_triangular_inv.
>
> L11 = Cholesky(A11, nb)
> A22 = ... U_triangular_inv(t(L11))
> L22 = Cholesky(A22, nb)
>
> However, note that there are much fewer calls to Cholesky due to the
> switch to the builtin cholesky according to the given min block size.
> For example, in our new test for dimensions 1362 x 1362 and min size
> of 200, we call Cholesky 15 times but U_triangular_inv 2539 times.
>
> For sufficiently large min block size this might be ok for Cholesky,
> because each level also does a number of matrix multiplies that will
> exploit the available parallelism of your cluster. In that regard. you
> might want to experiment with different block sizes and driver memory
> budgets. If I get a chance, I will also run a number of experiments
> and see if we can rewrite these scripts.
>
> Regards,
> Matthias
>
> On Sun, Apr 22, 2018 at 12:48 AM, Qifan Pu <[email protected]> wrote:
>> Matthias,
>>
>> Thanks so much for taking time to fix. Really appreciated it.
>> Does the same reasoning apply to the cholesky script? The recursive
>> approach
>> also looks inherently sequential.
>>
>> Best,
>> Qifan
>>
>> On Sat, Apr 21, 2018 at 11:39 PM, Matthias Boehm <[email protected]>
>> wrote:
>>>
>>> just as a quick update: this issue has now been fixed in SystemML
>>> master - it was essentially a missing guard for recursive functions
>>> when checking for unary size-preserving functions during
>>> inter-procedural analysis (IPA).
>>>
>>> However, while working with this recursive cholesky function I came to
>>> the conclusion that it may need some rework. The current top-down,
>>> depth-first, approach is inherently sequential. This is partially
>>> unnecessary because for the used recursive function U_triangular_inv
>>> (which is called many more times than cholesky), blocks per level are
>>> independent. Therefore, we should look into a bottom-up, breadth-first
>>> approach to parallelize over the blocks in each level, which could be
>>> done via parfor at script level.
>>>
>>> Regards,
>>> Matthias
>>>
>>> On Sat, Apr 21, 2018 at 6:59 PM, Matthias Boehm <[email protected]>
>>> wrote:
>>> > thanks for catching this - I just ran a toy example and this seems to
>>> > be a rewrite issue (there are specific right indexing rewrites that
>>> > collapse U[1:k,1:k] and U[1:k,k+1:n] into a single access to U which
>>> > helps for large distributed matrices). As a workaround, you can set
>>> > "sysml.optlevel" to 1 (instead of default 2, where 1 disables all
>>> > rewrites), which worked fine for me. I'll fix this later today. Also
>>> > I'll fix the naming from "Choleskey" to "Cholesky". Thanks again.
>>> >
>>> > Regards,
>>> > Matthias
>>> >
>>> >
>>> > On Sat, Apr 21, 2018 at 6:28 PM, Qifan Pu <[email protected]> wrote:
>>> >> Hi Matthias,
>>> >>
>>> >> Thanks for the fast response and detailed information. This is really
>>> >> helpful.
>>> >>
>>> >> I just tried to run it, and was tracing down a indexing bug that can
>>> >> be
>>> >> repeated by simply running the test script of triangle solve[1]
>>> >> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid
>>> >> values
>>> >> for
>>> >> matrix indexing: [1667:3333,1:1666] must be within matrix dimensions
>>> >> [1000,1000]
>>> >>
>>> >>
>>> >> Am I missing some configuration here?
>>> >>
>>> >>
>>> >> [1]
>>> >>
>>> >>
>>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_systemml_blob_master_scripts_staging_scalable-5Flinalg_test_test-5Ftriangular-5Finv.dml&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv2pWvLSi1psso4F9RdGjEWL6yc&m=FvqDr_AKzY5EAD_GAXIJoot0Z09NtMUt8kLShXcJxqQ&s=zIEgt74yeZzCTqvLCgV_0J8ECApG541uUlbaGMcK8bs&e=
>>> >>
>>> >>
>>> >> Best,
>>> >> Qifan
>>> >>
>>> >>
>>> >> On Sat, Apr 21, 2018 at 4:06 PM, Matthias Boehm <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> Hi Qifan,
>>> >>>
>>> >>> thanks for your feedback. You're right, the builtin functions
>>> >>> cholesky, inverse, eigen, solve, svd, qr, and lu are currently only
>>> >>> supported as single-node operations because they're still implemented
>>> >>> via Apache commons.math.
>>> >>>
>>> >>> However, there is an experimental script for distributed cholesky [1]
>>> >>> which uses a recursive approach (with operations that allow for
>>> >>> automatic distributed computation) for matrices larger than a
>>> >>> user-defined block size. Once blocks become small enough, we use
>>> >>> again
>>> >>> the builtin cholesky. Graduating this script would require a broader
>>> >>> set of experiments (and potential improvements) but it simply did not
>>> >>> have the highest priority so far. You might want to give it a try
>>> >>> though.
>>> >>>
>>> >>> Thanks again for your feedback - we'll consider a higher priority for
>>> >>> these distributed operations when discussing the roadmap for the next
>>> >>> releases.
>>> >>>
>>> >>> [1]
>>> >>>
>>> >>>
>>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_systemml_blob_master_scripts_staging_scalable-5Flinalg_cholesky.dml&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv2pWvLSi1psso4F9RdGjEWL6yc&m=FvqDr_AKzY5EAD_GAXIJoot0Z09NtMUt8kLShXcJxqQ&s=Yrj4GGcTlpZGRw34RoON_oO6-xDUtiIEUcO7-qIOyoc&e=
>>> >>>
>>> >>> Regards,
>>> >>> Matthias
>>> >>>
>>> >>> On Sat, Apr 21, 2018 at 2:15 PM, Qifan Pu <[email protected]> wrote:
>>> >>> > Hi,
>>> >>> >
>>> >>> > I would love to do distributed cholesky on large matrix with
>>> >>> > SystemML. I
>>> >>> > found two related jiras (SYSTEMML-1213, SYSTEMML-1163), but AFAIK,
>>> >>> > this
>>> >>> > is
>>> >>> > currently not implemented? I just wanted to check.
>>> >>> >
>>> >>> > Best,
>>> >>> > Qifan
>>> >>
>>> >>
>>
>>
>
>
>
>

Re: distributed cholesky on systemml

Reply via email to