Fwd: Extending Codegen algorithm tests for heuristics

Matthias Boehm Tue, 06 Mar 2018 22:15:34 -0800

---------- Forwarded message ----------
From: Matthias Boehm <mboe...@gmail.com>
Date: Tue, Mar 6, 2018 at 10:14 PM
Subject: Re: Extending Codegen algorithm tests for heuristics
To: Chamath Abeysinghe <abeysinghecham...@gmail.com>



Hi Chamath,

great thanks for your contribution - I left a couple of comments but we
should be ready to merge this in soon. If you want to get a better feeling
for the distributed spark backend as well, I created SYSTEMML-2169, which
aims to extend our recently added nary cbind/rbind operations to leverage
broadcasts when applicable.

Regarding the proposal, most of the backends are rather independent, but
each backend depends on the language integration. We will help out where
necessary. So it depends on your interests and ideas. If you're more
interested in defining the language APIs, make this and a simple backend
the core of your proposal. If you're more interested in the runtime
backends, I would help and add a basic language integration in time, which
would allow you to immediate start working on the backends.

Following the GSoC guidelines it's usually better to underscope the project
than overscope it because you want to ensure that you're able to
successfully complete the project in the ambitious timeframe and there will
always be unforeseen obstacles. I would recommend to define a core project
and potential extensions you will address if time allows. For example, the
local, multi-threaded backend can indeed be realized relatively quickly.
However, subsequently we can add and experiment with Hogwild! (i.e.,
unsynchronized updates) which is known to work well for sparse models,
replication and partitioning in NUMA settings, and potentially the
automatic selection of update strategies.

Regards,
Matthias


On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe <
abeysinghecham...@gmail.com> wrote:

> Hi,
> I have sent a pull request for this issue.
> As a next step, could you suggest any new issue? or anything I have to do
> to familiarize with Language and run time for parameter servers project.
>
> And regarding writing the project proposal I have few questions.
> * In the epic there are few sub tasks, is it enough to focus on a single
> task through out the summer? Would it have enough work load or should I go
> for multiple tasks?
> * What is the linkage between sub tasks? Do tasks like, Distributed Spark
> Back-end or Local multi threaded back ends; need previous tasks completed
> before starting work?
>
> I am glad if you could suggests some issues related to Distributed spark
> back-end or multi threaded backend tasks.
>
> Thanks.
> Regards,
> Chamath
>
>
> On Fri, Mar 2, 2018 at 6:46 AM, Matthias Boehm <mboe...@gmail.com> wrote:
>
>> Hi Chamath,
>>
>> in general, you're absolutely right - you can enable -stats and
>> programmatically probe the heavy hitter statistics for certain opcodes.
>> However, uamin and uamax stand for "unary aggregate minimum" and "unary
>> aggregation maximum" which correspond to min(X) and max(X) on script level.
>> Instead all generated fused operators are prefixed with spoof or sp_spoof
>> (for distributed spark operations). The related junit assertion should
>> already be in the existing tests, I just mentioned it for completeness.
>>
>> Regards,
>> Matthias
>>
>> On Thu, Mar 1, 2018 at 4:30 AM, Chamath Abeysinghe <
>> abeysinghecham...@gmail.com> wrote:
>>
>>> Thanks for your detailed reply.
>>> I did some coding
>>> <https://github.com/apache/systemml/compare/master...chamathabeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159>
>>> [1] for this issue SYSTEMML-2159 to extend test cases for FA & FNR . I got
>>> a problem regarding success criteria's, "generating at least one fused
>>> operator" condition, I think this means I have to look into stats of Heavy
>>> hitter instructions and check if there are any fused operators. (my guess
>>> is uamin and uamax are the operators what I have to look for, but I am not
>>> sure about this because I don't know the meaning of these instructions).
>>>
>>> Please help me to clarify this. If my approach is correct I could send a
>>> PR after fixing tests for other algorithms. Thanks.
>>>
>>> Regards,
>>> Chamath
>>>
>>> [1] https://github.com/apache/systemml/compare/master...cham
>>> athabeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159
>>>
>>>
>>> On Tue, Feb 27, 2018 at 1:54 AM, Matthias Boehm <mboe...@gmail.com>
>>> wrote:
>>>
>>>> ---------- Forwarded message ----------
>>>> From: Matthias Boehm <mboe...@gmail.com>
>>>> Date: Mon, Feb 26, 2018 at 11:59 AM
>>>> Subject: Re: Extending Codegen algorithm tests for heuristics
>>>> To: Chamath Abeysinghe <abeysinghecham...@gmail.com>
>>>>
>>>>
>>>> great - thanks for taking this over Chamath.
>>>>
>>>> In general, I would recommend to use this task to explore SystemML a
>>>> little. For example, take one of the codegen algorithm tests from
>>>> org.apache.sysml.test.integration.functions.codegenalg (e.g.,
>>>> AlgorithmL2SVM) and pass different flags such as -stats, -explain,
>>>> -explain
>>>> recompile_hops, -explain recompile_runtime to programArgs and try to
>>>> understand the output. If you come over specific questions, please just
>>>> ask.
>>>>
>>>> To answer your detailed questions:
>>>>
>>>> 1) We recently added a code generation framework that automatically
>>>> identifies opportunities for fused operators and subsequently generates
>>>> code for these operators. A major part is the selection of fusion plans,
>>>> for which we provide heuristics and a cost-based optimizer. By default
>>>> (and
>>>> thus also in our testsuite), we use the cost-based optimizer, but it
>>>> would
>>>> be good regularly test the heuristics as well.
>>>>
>>>> 2) You can configure the used optimizer in your SystemML-config.xml
>>>> file as
>>>> follows:
>>>> <sysml.codegen.optimizer>fuse_all</sysml.codegen.optimizer>
>>>> Valid alternatives are: fuse_all, fuse_no_redundancy, fuse_cost_based,
>>>> and
>>>> fuse_cost_based_v2 (default). You can provide alternative config xml
>>>> files
>>>> and switch them dynamically via getConfigTemplateFile.
>>>>
>>>> 3) Similar to the existing tests, it needs to (1) run without errors,
>>>> (2)
>>>> produce correct results as compared to R, and (3) generate at least one
>>>> fused operator.
>>>>
>>>> Regards,
>>>> Matthias
>>>>
>>>> On Mon, Feb 26, 2018 at 6:54 AM, Chamath Abeysinghe <
>>>> abeysinghecham...@gmail.com> wrote:
>>>>
>>>> > Hi All,
>>>> > As per the guidelines given to GSoC students, I would like to work on
>>>> the
>>>> > SYSTEMML-2159 [1] issue as a starting point. But I don't understand
>>>> the
>>>> > background of the issue. Can someone help me with understanding the
>>>> context
>>>> > of this issue?
>>>> >
>>>> > Few problems I got are,
>>>> >
>>>> > 1) What are fusion heuristics, fuse-all and fuse-no-redundancy?
>>>> > 2) Can I pass those heuristic related configurations as args to
>>>> execute
>>>> > DMLScript?
>>>> > 3) What is the success criteria for a test that use those heuristics?
>>>> >
>>>> > Thank you in advance
>>>> >
>>>> > Regards,
>>>> > Chamath
>>>> >
>>>> > [1] https://issues.apache.org/jira/browse/SYSTEMML-2159
>>>> >
>>>> > --
>>>> > Chamath Abeysinghe
>>>> > Department of Computer Science and Engineering
>>>> > University of Moratuwa
>>>> >   <https://www.facebook.com/chamath.abeysinghe.3>  [image:
>>>> > https://www.linkedin.com/in/kaushalya-gayan-batawala-bbb5927
>>>> a?trk=hp-identity-name]
>>>> > <https://lk.linkedin.com/in/chamathabeysinghe>
>>>> > Mobile : +94752930548
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Chamath Abeysinghe
>>> Department of Computer Science and Engineering
>>> University of Moratuwa
>>>   <https://www.facebook.com/chamath.abeysinghe.3>  [image:
>>> https://www.linkedin.com/in/kaushalya-gayan-batawala-bbb5927a?trk=hp-identity-name]
>>> <https://lk.linkedin.com/in/chamathabeysinghe>
>>> Mobile : +94752930548
>>>
>>
>>
>
>
> --
> Chamath Abeysinghe
> Department of Computer Science and Engineering
> University of Moratuwa
> Mobile: +94712803295 <+94%2071%20280%203295>
>

Fwd: Extending Codegen algorithm tests for heuristics

Reply via email to