Re: Extending Codegen algorithm tests for heuristics

Chamath Abeysinghe Tue, 13 Mar 2018 07:36:59 -0700

Hi Matthias,
I am working on SYSTEMML-2169 issue. I have sent a partially completed PR (
https://github.com/apache/systemml/pull/747 ). After those changes, some
test cases in NaryRBindTest, are failing and I could not understand the
reason.
Test cases are failing with following error
*Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched block
sizes for: 280 101 1000 101                                            *
* at
org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:622)*
* at
org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:596)*


Even after debugging the whole process I could not find a reason for this.
If you can give any suggestion that would be really helpful.

If you have any other comment regarding the PR I could modify code
according to that.

Thanks,
Chamath


On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <[email protected]> wrote:

> ---------- Forwarded message ----------
> From: Matthias Boehm <[email protected]>
> Date: Tue, Mar 6, 2018 at 10:14 PM
> Subject: Re: Extending Codegen algorithm tests for heuristics
> To: Chamath Abeysinghe <[email protected]>
>
>
> Hi Chamath,
>
> great thanks for your contribution - I left a couple of comments but we
> should be ready to merge this in soon. If you want to get a better feeling
> for the distributed spark backend as well, I created SYSTEMML-2169, which
> aims to extend our recently added nary cbind/rbind operations to leverage
> broadcasts when applicable.
>
> Regarding the proposal, most of the backends are rather independent, but
> each backend depends on the language integration. We will help out where
> necessary. So it depends on your interests and ideas. If you're more
> interested in defining the language APIs, make this and a simple backend
> the core of your proposal. If you're more interested in the runtime
> backends, I would help and add a basic language integration in time, which
> would allow you to immediate start working on the backends.
>
> Following the GSoC guidelines it's usually better to underscope the project
> than overscope it because you want to ensure that you're able to
> successfully complete the project in the ambitious timeframe and there will
> always be unforeseen obstacles. I would recommend to define a core project
> and potential extensions you will address if time allows. For example, the
> local, multi-threaded backend can indeed be realized relatively quickly.
> However, subsequently we can add and experiment with Hogwild! (i.e.,
> unsynchronized updates) which is known to work well for sparse models,
> replication and partitioning in NUMA settings, and potentially the
> automatic selection of update strategies.
>
> Regards,
> Matthias
>
>
> On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe <
> [email protected]> wrote:
>
> > Hi,
> > I have sent a pull request for this issue.
> > As a next step, could you suggest any new issue? or anything I have to do
> > to familiarize with Language and run time for parameter servers project.
> >
> > And regarding writing the project proposal I have few questions.
> > * In the epic there are few sub tasks, is it enough to focus on a single
> > task through out the summer? Would it have enough work load or should I
> go
> > for multiple tasks?
> > * What is the linkage between sub tasks? Do tasks like, Distributed Spark
> > Back-end or Local multi threaded back ends; need previous tasks completed
> > before starting work?
> >
> > I am glad if you could suggests some issues related to Distributed spark
> > back-end or multi threaded backend tasks.
> >
> > Thanks.
> > Regards,
> > Chamath
> >
> >
> > On Fri, Mar 2, 2018 at 6:46 AM, Matthias Boehm <[email protected]>
> wrote:
> >
> >> Hi Chamath,
> >>
> >> in general, you're absolutely right - you can enable -stats and
> >> programmatically probe the heavy hitter statistics for certain opcodes.
> >> However, uamin and uamax stand for "unary aggregate minimum" and "unary
> >> aggregation maximum" which correspond to min(X) and max(X) on script
> level.
> >> Instead all generated fused operators are prefixed with spoof or
> sp_spoof
> >> (for distributed spark operations). The related junit assertion should
> >> already be in the existing tests, I just mentioned it for completeness.
> >>
> >> Regards,
> >> Matthias
> >>
> >> On Thu, Mar 1, 2018 at 4:30 AM, Chamath Abeysinghe <
> >> [email protected]> wrote:
> >>
> >>> Thanks for your detailed reply.
> >>> I did some coding
> >>> <https://github.com/apache/systemml/compare/master...
> chamathabeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159>
> >>> [1] for this issue SYSTEMML-2159 to extend test cases for FA & FNR . I
> got
> >>> a problem regarding success criteria's, "generating at least one fused
> >>> operator" condition, I think this means I have to look into stats of
> Heavy
> >>> hitter instructions and check if there are any fused operators. (my
> guess
> >>> is uamin and uamax are the operators what I have to look for, but I am
> not
> >>> sure about this because I don't know the meaning of these
> instructions).
> >>>
> >>> Please help me to clarify this. If my approach is correct I could send
> a
> >>> PR after fixing tests for other algorithms. Thanks.
> >>>
> >>> Regards,
> >>> Chamath
> >>>
> >>> [1] https://github.com/apache/systemml/compare/master...cham
> >>> athabeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159
> >>>
> >>>
> >>> On Tue, Feb 27, 2018 at 1:54 AM, Matthias Boehm <[email protected]>
> >>> wrote:
> >>>
> >>>> ---------- Forwarded message ----------
> >>>> From: Matthias Boehm <[email protected]>
> >>>> Date: Mon, Feb 26, 2018 at 11:59 AM
> >>>> Subject: Re: Extending Codegen algorithm tests for heuristics
> >>>> To: Chamath Abeysinghe <[email protected]>
> >>>>
> >>>>
> >>>> great - thanks for taking this over Chamath.
> >>>>
> >>>> In general, I would recommend to use this task to explore SystemML a
> >>>> little. For example, take one of the codegen algorithm tests from
> >>>> org.apache.sysml.test.integration.functions.codegenalg (e.g.,
> >>>> AlgorithmL2SVM) and pass different flags such as -stats, -explain,
> >>>> -explain
> >>>> recompile_hops, -explain recompile_runtime to programArgs and try to
> >>>> understand the output. If you come over specific questions, please
> just
> >>>> ask.
> >>>>
> >>>> To answer your detailed questions:
> >>>>
> >>>> 1) We recently added a code generation framework that automatically
> >>>> identifies opportunities for fused operators and subsequently
> generates
> >>>> code for these operators. A major part is the selection of fusion
> plans,
> >>>> for which we provide heuristics and a cost-based optimizer. By default
> >>>> (and
> >>>> thus also in our testsuite), we use the cost-based optimizer, but it
> >>>> would
> >>>> be good regularly test the heuristics as well.
> >>>>
> >>>> 2) You can configure the used optimizer in your SystemML-config.xml
> >>>> file as
> >>>> follows:
> >>>> <sysml.codegen.optimizer>fuse_all</sysml.codegen.optimizer>
> >>>> Valid alternatives are: fuse_all, fuse_no_redundancy, fuse_cost_based,
> >>>> and
> >>>> fuse_cost_based_v2 (default). You can provide alternative config xml
> >>>> files
> >>>> and switch them dynamically via getConfigTemplateFile.
> >>>>
> >>>> 3) Similar to the existing tests, it needs to (1) run without errors,
> >>>> (2)
> >>>> produce correct results as compared to R, and (3) generate at least
> one
> >>>> fused operator.
> >>>>
> >>>> Regards,
> >>>> Matthias
> >>>>
> >>>> On Mon, Feb 26, 2018 at 6:54 AM, Chamath Abeysinghe <
> >>>> [email protected]> wrote:
> >>>>
> >>>> > Hi All,
> >>>> > As per the guidelines given to GSoC students, I would like to work
> on
> >>>> the
> >>>> > SYSTEMML-2159 [1] issue as a starting point. But I don't understand
> >>>> the
> >>>> > background of the issue. Can someone help me with understanding the
> >>>> context
> >>>> > of this issue?
> >>>> >
> >>>> > Few problems I got are,
> >>>> >
> >>>> > 1) What are fusion heuristics, fuse-all and fuse-no-redundancy?
> >>>> > 2) Can I pass those heuristic related configurations as args to
> >>>> execute
> >>>> > DMLScript?
> >>>> > 3) What is the success criteria for a test that use those
> heuristics?
> >>>> >
> >>>> > Thank you in advance
> >>>> >
> >>>> > Regards,
> >>>> > Chamath
> >>>> >
> >>>> > [1] https://issues.apache.org/jira/browse/SYSTEMML-2159
> >>>> >
> >>>> > --
> >>>> > Chamath Abeysinghe
> >>>> > Department of Computer Science and Engineering
> >>>> > University of Moratuwa
> >>>> >   <https://www.facebook.com/chamath.abeysinghe.3>  [image:
> >>>> > https://www.linkedin.com/in/kaushalya-gayan-batawala-bbb5927
> >>>> a?trk=hp-identity-name]
> >>>> > <https://lk.linkedin.com/in/chamathabeysinghe>
> >>>> > Mobile : +94752930548
> >>>> >
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Chamath Abeysinghe
> >>> Department of Computer Science and Engineering
> >>> University of Moratuwa
> >>>   <https://www.facebook.com/chamath.abeysinghe.3>  [image:
> >>> https://www.linkedin.com/in/kaushalya-gayan-batawala-
> bbb5927a?trk=hp-identity-name]
> >>> <https://lk.linkedin.com/in/chamathabeysinghe>
> >>> Mobile : +94752930548
> >>>
> >>
> >>
> >
> >
> > --
> > Chamath Abeysinghe
> > Department of Computer Science and Engineering
> > University of Moratuwa
> > Mobile: +94712803295 <+94%2071%20280%203295>
> >
>



-- 
Chamath Abeysinghe
Department of Computer Science and Engineering
University of Moratuwa
Mobile: +94712803295

Re: Extending Codegen algorithm tests for heuristics

Reply via email to