Fwd: Extending Codegen algorithm tests for heuristics

Matthias Boehm Tue, 13 Mar 2018 19:13:23 -0700

---------- Forwarded message ----------
From: Matthias Boehm <mboe...@gmail.com>
Date: Tue, Mar 13, 2018 at 1:00 PM
Subject: Re: Extending Codegen algorithm tests for heuristics
To: Chamath Abeysinghe <abeysinghecham...@gmail.com>



without debugging it's hard to tell, but usually something like this
happens if blocks are incorrectly aligned. So I would recommend to simply
do a mapToPair before the RDDAggregateUtils.mergeByKey and validate the
correctness of shifted block indexes. Maybe the newly introduced broadcasts
are not shifted into their target positions? For example, consider
cbind(A,B) - before aggregation, B needs to be shifted by ncol(A).
Furthermore, it would be great to avoid unnecessary aggregation if all but
one inputs are broadcasts.

Regards,
Matthias

On Tue, Mar 13, 2018 at 7:36 AM, Chamath Abeysinghe <
abeysinghecham...@gmail.com> wrote:

> Hi Matthias,
> I am working on SYSTEMML-2169 issue. I have sent a partially completed PR
> ( https://github.com/apache/systemml/pull/747 ). After those changes,
> some test cases in NaryRBindTest, are failing and I could not understand
> the reason.
> Test cases are failing with following error
> *Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched block
> sizes for: 280 101 1000 101                                            *
> * at
> org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:622)*
> * at
> org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:596)*
>
> Even after debugging the whole process I could not find a reason for this.
> If you can give any suggestion that would be really helpful.
>
> If you have any other comment regarding the PR I could modify code
> according to that.
>
> Thanks,
> Chamath
>
>
> On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <mboe...@gmail.com> wrote:
>
>> ---------- Forwarded message ----------
>> From: Matthias Boehm <mboe...@gmail.com>
>> Date: Tue, Mar 6, 2018 at 10:14 PM
>> Subject: Re: Extending Codegen algorithm tests for heuristics
>> To: Chamath Abeysinghe <abeysinghecham...@gmail.com>
>>
>>
>> Hi Chamath,
>>
>> great thanks for your contribution - I left a couple of comments but we
>> should be ready to merge this in soon. If you want to get a better feeling
>> for the distributed spark backend as well, I created SYSTEMML-2169, which
>> aims to extend our recently added nary cbind/rbind operations to leverage
>> broadcasts when applicable.
>>
>> Regarding the proposal, most of the backends are rather independent, but
>> each backend depends on the language integration. We will help out where
>> necessary. So it depends on your interests and ideas. If you're more
>> interested in defining the language APIs, make this and a simple backend
>> the core of your proposal. If you're more interested in the runtime
>> backends, I would help and add a basic language integration in time, which
>> would allow you to immediate start working on the backends.
>>
>> Following the GSoC guidelines it's usually better to underscope the
>> project
>> than overscope it because you want to ensure that you're able to
>> successfully complete the project in the ambitious timeframe and there
>> will
>> always be unforeseen obstacles. I would recommend to define a core project
>> and potential extensions you will address if time allows. For example, the
>> local, multi-threaded backend can indeed be realized relatively quickly.
>> However, subsequently we can add and experiment with Hogwild! (i.e.,
>> unsynchronized updates) which is known to work well for sparse models,
>> replication and partitioning in NUMA settings, and potentially the
>> automatic selection of update strategies.
>>
>> Regards,
>> Matthias
>>
>>
>> On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe <
>> abeysinghecham...@gmail.com> wrote:
>>
>> > Hi,
>> > I have sent a pull request for this issue.
>> > As a next step, could you suggest any new issue? or anything I have to
>> do
>> > to familiarize with Language and run time for parameter servers project.
>> >
>> > And regarding writing the project proposal I have few questions.
>> > * In the epic there are few sub tasks, is it enough to focus on a single
>> > task through out the summer? Would it have enough work load or should I
>> go
>> > for multiple tasks?
>> > * What is the linkage between sub tasks? Do tasks like, Distributed
>> Spark
>> > Back-end or Local multi threaded back ends; need previous tasks
>> completed
>> > before starting work?
>> >
>> > I am glad if you could suggests some issues related to Distributed spark
>> > back-end or multi threaded backend tasks.
>> >
>> > Thanks.
>> > Regards,
>> > Chamath
>> >
>> >
>> > On Fri, Mar 2, 2018 at 6:46 AM, Matthias Boehm <mboe...@gmail.com>
>> wrote:
>> >
>> >> Hi Chamath,
>> >>
>> >> in general, you're absolutely right - you can enable -stats and
>> >> programmatically probe the heavy hitter statistics for certain opcodes.
>> >> However, uamin and uamax stand for "unary aggregate minimum" and "unary
>> >> aggregation maximum" which correspond to min(X) and max(X) on script
>> level.
>> >> Instead all generated fused operators are prefixed with spoof or
>> sp_spoof
>> >> (for distributed spark operations). The related junit assertion should
>> >> already be in the existing tests, I just mentioned it for completeness.
>> >>
>> >> Regards,
>> >> Matthias
>> >>
>> >> On Thu, Mar 1, 2018 at 4:30 AM, Chamath Abeysinghe <
>> >> abeysinghecham...@gmail.com> wrote:
>> >>
>> >>> Thanks for your detailed reply.
>> >>> I did some coding
>> >>> <https://github.com/apache/systemml/compare/master...chamath
>> abeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159>
>> >>> [1] for this issue SYSTEMML-2159 to extend test cases for FA & FNR .
>> I got
>> >>> a problem regarding success criteria's, "generating at least one fused
>> >>> operator" condition, I think this means I have to look into stats of
>> Heavy
>> >>> hitter instructions and check if there are any fused operators. (my
>> guess
>> >>> is uamin and uamax are the operators what I have to look for, but I
>> am not
>> >>> sure about this because I don't know the meaning of these
>> instructions).
>> >>>
>> >>> Please help me to clarify this. If my approach is correct I could
>> send a
>> >>> PR after fixing tests for other algorithms. Thanks.
>> >>>
>> >>> Regards,
>> >>> Chamath
>> >>>
>> >>> [1] https://github.com/apache/systemml/compare/master...cham
>> >>> athabeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159
>> >>>
>> >>>
>> >>> On Tue, Feb 27, 2018 at 1:54 AM, Matthias Boehm <mboe...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> ---------- Forwarded message ----------
>> >>>> From: Matthias Boehm <mboe...@gmail.com>
>> >>>> Date: Mon, Feb 26, 2018 at 11:59 AM
>> >>>> Subject: Re: Extending Codegen algorithm tests for heuristics
>> >>>> To: Chamath Abeysinghe <abeysinghecham...@gmail.com>
>> >>>>
>> >>>>
>> >>>> great - thanks for taking this over Chamath.
>> >>>>
>> >>>> In general, I would recommend to use this task to explore SystemML a
>> >>>> little. For example, take one of the codegen algorithm tests from
>> >>>> org.apache.sysml.test.integration.functions.codegenalg (e.g.,
>> >>>> AlgorithmL2SVM) and pass different flags such as -stats, -explain,
>> >>>> -explain
>> >>>> recompile_hops, -explain recompile_runtime to programArgs and try to
>> >>>> understand the output. If you come over specific questions, please
>> just
>> >>>> ask.
>> >>>>
>> >>>> To answer your detailed questions:
>> >>>>
>> >>>> 1) We recently added a code generation framework that automatically
>> >>>> identifies opportunities for fused operators and subsequently
>> generates
>> >>>> code for these operators. A major part is the selection of fusion
>> plans,
>> >>>> for which we provide heuristics and a cost-based optimizer. By
>> default
>> >>>> (and
>> >>>> thus also in our testsuite), we use the cost-based optimizer, but it
>> >>>> would
>> >>>> be good regularly test the heuristics as well.
>> >>>>
>> >>>> 2) You can configure the used optimizer in your SystemML-config.xml
>> >>>> file as
>> >>>> follows:
>> >>>> <sysml.codegen.optimizer>fuse_all</sysml.codegen.optimizer>
>> >>>> Valid alternatives are: fuse_all, fuse_no_redundancy,
>> fuse_cost_based,
>> >>>> and
>> >>>> fuse_cost_based_v2 (default). You can provide alternative config xml
>> >>>> files
>> >>>> and switch them dynamically via getConfigTemplateFile.
>> >>>>
>> >>>> 3) Similar to the existing tests, it needs to (1) run without errors,
>> >>>> (2)
>> >>>> produce correct results as compared to R, and (3) generate at least
>> one
>> >>>> fused operator.
>> >>>>
>> >>>> Regards,
>> >>>> Matthias
>> >>>>
>> >>>> On Mon, Feb 26, 2018 at 6:54 AM, Chamath Abeysinghe <
>> >>>> abeysinghecham...@gmail.com> wrote:
>> >>>>
>> >>>> > Hi All,
>> >>>> > As per the guidelines given to GSoC students, I would like to work
>> on
>> >>>> the
>> >>>> > SYSTEMML-2159 [1] issue as a starting point. But I don't understand
>> >>>> the
>> >>>> > background of the issue. Can someone help me with understanding the
>> >>>> context
>> >>>> > of this issue?
>> >>>> >
>> >>>> > Few problems I got are,
>> >>>> >
>> >>>> > 1) What are fusion heuristics, fuse-all and fuse-no-redundancy?
>> >>>> > 2) Can I pass those heuristic related configurations as args to
>> >>>> execute
>> >>>> > DMLScript?
>> >>>> > 3) What is the success criteria for a test that use those
>> heuristics?
>> >>>> >
>> >>>> > Thank you in advance
>> >>>> >
>> >>>> > Regards,
>> >>>> > Chamath
>> >>>> >
>> >>>> > [1] https://issues.apache.org/jira/browse/SYSTEMML-2159
>> >>>> >
>> >>>> > --
>> >>>> > Chamath Abeysinghe
>> >>>> > Department of Computer Science and Engineering
>> >>>> > University of Moratuwa
>> >>>> >   <https://www.facebook.com/chamath.abeysinghe.3>  [image:
>> >>>> > https://www.linkedin.com/in/kaushalya-gayan-batawala-bbb5927
>> >>>> a?trk=hp-identity-name]
>> >>>> > <https://lk.linkedin.com/in/chamathabeysinghe>
>> >>>> > Mobile : +94752930548
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Chamath Abeysinghe
>> >>> Department of Computer Science and Engineering
>> >>> University of Moratuwa
>> >>>   <https://www.facebook.com/chamath.abeysinghe.3>  [image:
>> >>> https://www.linkedin.com/in/kaushalya-gayan-batawala-bbb5927
>> a?trk=hp-identity-name]
>> >>> <https://lk.linkedin.com/in/chamathabeysinghe>
>> >>> Mobile : +94752930548
>> >>>
>> >>
>> >>
>> >
>> >
>> > --
>> > Chamath Abeysinghe
>> > Department of Computer Science and Engineering
>> > University of Moratuwa
>> > Mobile: +94712803295 <+94%2071%20280%203295>
>> >
>>
>
>
>
> --
> Chamath Abeysinghe
> Department of Computer Science and Engineering
> University of Moratuwa
> Mobile: +94712803295 <+94%2071%20280%203295>
>

Fwd: Extending Codegen algorithm tests for heuristics

Reply via email to