---------- Forwarded message ---------- From: Matthias Boehm <mboe...@gmail.com> Date: Tue, Mar 13, 2018 at 1:00 PM Subject: Re: Extending Codegen algorithm tests for heuristics To: Chamath Abeysinghe <abeysinghecham...@gmail.com>
without debugging it's hard to tell, but usually something like this happens if blocks are incorrectly aligned. So I would recommend to simply do a mapToPair before the RDDAggregateUtils.mergeByKey and validate the correctness of shifted block indexes. Maybe the newly introduced broadcasts are not shifted into their target positions? For example, consider cbind(A,B) - before aggregation, B needs to be shifted by ncol(A). Furthermore, it would be great to avoid unnecessary aggregation if all but one inputs are broadcasts. Regards, Matthias On Tue, Mar 13, 2018 at 7:36 AM, Chamath Abeysinghe < abeysinghecham...@gmail.com> wrote: > Hi Matthias, > I am working on SYSTEMML-2169 issue. I have sent a partially completed PR > ( https://github.com/apache/systemml/pull/747 ). After those changes, > some test cases in NaryRBindTest, are failing and I could not understand > the reason. > Test cases are failing with following error > *Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched block > sizes for: 280 101 1000 101 * > * at > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:622)* > * at > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:596)* > > Even after debugging the whole process I could not find a reason for this. > If you can give any suggestion that would be really helpful. > > If you have any other comment regarding the PR I could modify code > according to that. > > Thanks, > Chamath > > > On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <mboe...@gmail.com> wrote: > >> ---------- Forwarded message ---------- >> From: Matthias Boehm <mboe...@gmail.com> >> Date: Tue, Mar 6, 2018 at 10:14 PM >> Subject: Re: Extending Codegen algorithm tests for heuristics >> To: Chamath Abeysinghe <abeysinghecham...@gmail.com> >> >> >> Hi Chamath, >> >> great thanks for your contribution - I left a couple of comments but we >> should be ready to merge this in soon. If you want to get a better feeling >> for the distributed spark backend as well, I created SYSTEMML-2169, which >> aims to extend our recently added nary cbind/rbind operations to leverage >> broadcasts when applicable. >> >> Regarding the proposal, most of the backends are rather independent, but >> each backend depends on the language integration. We will help out where >> necessary. So it depends on your interests and ideas. If you're more >> interested in defining the language APIs, make this and a simple backend >> the core of your proposal. If you're more interested in the runtime >> backends, I would help and add a basic language integration in time, which >> would allow you to immediate start working on the backends. >> >> Following the GSoC guidelines it's usually better to underscope the >> project >> than overscope it because you want to ensure that you're able to >> successfully complete the project in the ambitious timeframe and there >> will >> always be unforeseen obstacles. I would recommend to define a core project >> and potential extensions you will address if time allows. For example, the >> local, multi-threaded backend can indeed be realized relatively quickly. >> However, subsequently we can add and experiment with Hogwild! (i.e., >> unsynchronized updates) which is known to work well for sparse models, >> replication and partitioning in NUMA settings, and potentially the >> automatic selection of update strategies. >> >> Regards, >> Matthias >> >> >> On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe < >> abeysinghecham...@gmail.com> wrote: >> >> > Hi, >> > I have sent a pull request for this issue. >> > As a next step, could you suggest any new issue? or anything I have to >> do >> > to familiarize with Language and run time for parameter servers project. >> > >> > And regarding writing the project proposal I have few questions. >> > * In the epic there are few sub tasks, is it enough to focus on a single >> > task through out the summer? Would it have enough work load or should I >> go >> > for multiple tasks? >> > * What is the linkage between sub tasks? Do tasks like, Distributed >> Spark >> > Back-end or Local multi threaded back ends; need previous tasks >> completed >> > before starting work? >> > >> > I am glad if you could suggests some issues related to Distributed spark >> > back-end or multi threaded backend tasks. >> > >> > Thanks. >> > Regards, >> > Chamath >> > >> > >> > On Fri, Mar 2, 2018 at 6:46 AM, Matthias Boehm <mboe...@gmail.com> >> wrote: >> > >> >> Hi Chamath, >> >> >> >> in general, you're absolutely right - you can enable -stats and >> >> programmatically probe the heavy hitter statistics for certain opcodes. >> >> However, uamin and uamax stand for "unary aggregate minimum" and "unary >> >> aggregation maximum" which correspond to min(X) and max(X) on script >> level. >> >> Instead all generated fused operators are prefixed with spoof or >> sp_spoof >> >> (for distributed spark operations). The related junit assertion should >> >> already be in the existing tests, I just mentioned it for completeness. >> >> >> >> Regards, >> >> Matthias >> >> >> >> On Thu, Mar 1, 2018 at 4:30 AM, Chamath Abeysinghe < >> >> abeysinghecham...@gmail.com> wrote: >> >> >> >>> Thanks for your detailed reply. >> >>> I did some coding >> >>> <https://github.com/apache/systemml/compare/master...chamath >> abeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159> >> >>> [1] for this issue SYSTEMML-2159 to extend test cases for FA & FNR . >> I got >> >>> a problem regarding success criteria's, "generating at least one fused >> >>> operator" condition, I think this means I have to look into stats of >> Heavy >> >>> hitter instructions and check if there are any fused operators. (my >> guess >> >>> is uamin and uamax are the operators what I have to look for, but I >> am not >> >>> sure about this because I don't know the meaning of these >> instructions). >> >>> >> >>> Please help me to clarify this. If my approach is correct I could >> send a >> >>> PR after fixing tests for other algorithms. Thanks. >> >>> >> >>> Regards, >> >>> Chamath >> >>> >> >>> [1] https://github.com/apache/systemml/compare/master...cham >> >>> athabeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159 >> >>> >> >>> >> >>> On Tue, Feb 27, 2018 at 1:54 AM, Matthias Boehm <mboe...@gmail.com> >> >>> wrote: >> >>> >> >>>> ---------- Forwarded message ---------- >> >>>> From: Matthias Boehm <mboe...@gmail.com> >> >>>> Date: Mon, Feb 26, 2018 at 11:59 AM >> >>>> Subject: Re: Extending Codegen algorithm tests for heuristics >> >>>> To: Chamath Abeysinghe <abeysinghecham...@gmail.com> >> >>>> >> >>>> >> >>>> great - thanks for taking this over Chamath. >> >>>> >> >>>> In general, I would recommend to use this task to explore SystemML a >> >>>> little. For example, take one of the codegen algorithm tests from >> >>>> org.apache.sysml.test.integration.functions.codegenalg (e.g., >> >>>> AlgorithmL2SVM) and pass different flags such as -stats, -explain, >> >>>> -explain >> >>>> recompile_hops, -explain recompile_runtime to programArgs and try to >> >>>> understand the output. If you come over specific questions, please >> just >> >>>> ask. >> >>>> >> >>>> To answer your detailed questions: >> >>>> >> >>>> 1) We recently added a code generation framework that automatically >> >>>> identifies opportunities for fused operators and subsequently >> generates >> >>>> code for these operators. A major part is the selection of fusion >> plans, >> >>>> for which we provide heuristics and a cost-based optimizer. By >> default >> >>>> (and >> >>>> thus also in our testsuite), we use the cost-based optimizer, but it >> >>>> would >> >>>> be good regularly test the heuristics as well. >> >>>> >> >>>> 2) You can configure the used optimizer in your SystemML-config.xml >> >>>> file as >> >>>> follows: >> >>>> <sysml.codegen.optimizer>fuse_all</sysml.codegen.optimizer> >> >>>> Valid alternatives are: fuse_all, fuse_no_redundancy, >> fuse_cost_based, >> >>>> and >> >>>> fuse_cost_based_v2 (default). You can provide alternative config xml >> >>>> files >> >>>> and switch them dynamically via getConfigTemplateFile. >> >>>> >> >>>> 3) Similar to the existing tests, it needs to (1) run without errors, >> >>>> (2) >> >>>> produce correct results as compared to R, and (3) generate at least >> one >> >>>> fused operator. >> >>>> >> >>>> Regards, >> >>>> Matthias >> >>>> >> >>>> On Mon, Feb 26, 2018 at 6:54 AM, Chamath Abeysinghe < >> >>>> abeysinghecham...@gmail.com> wrote: >> >>>> >> >>>> > Hi All, >> >>>> > As per the guidelines given to GSoC students, I would like to work >> on >> >>>> the >> >>>> > SYSTEMML-2159 [1] issue as a starting point. But I don't understand >> >>>> the >> >>>> > background of the issue. Can someone help me with understanding the >> >>>> context >> >>>> > of this issue? >> >>>> > >> >>>> > Few problems I got are, >> >>>> > >> >>>> > 1) What are fusion heuristics, fuse-all and fuse-no-redundancy? >> >>>> > 2) Can I pass those heuristic related configurations as args to >> >>>> execute >> >>>> > DMLScript? >> >>>> > 3) What is the success criteria for a test that use those >> heuristics? >> >>>> > >> >>>> > Thank you in advance >> >>>> > >> >>>> > Regards, >> >>>> > Chamath >> >>>> > >> >>>> > [1] https://issues.apache.org/jira/browse/SYSTEMML-2159 >> >>>> > >> >>>> > -- >> >>>> > Chamath Abeysinghe >> >>>> > Department of Computer Science and Engineering >> >>>> > University of Moratuwa >> >>>> > <https://www.facebook.com/chamath.abeysinghe.3> [image: >> >>>> > https://www.linkedin.com/in/kaushalya-gayan-batawala-bbb5927 >> >>>> a?trk=hp-identity-name] >> >>>> > <https://lk.linkedin.com/in/chamathabeysinghe> >> >>>> > Mobile : +94752930548 >> >>>> > >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Chamath Abeysinghe >> >>> Department of Computer Science and Engineering >> >>> University of Moratuwa >> >>> <https://www.facebook.com/chamath.abeysinghe.3> [image: >> >>> https://www.linkedin.com/in/kaushalya-gayan-batawala-bbb5927 >> a?trk=hp-identity-name] >> >>> <https://lk.linkedin.com/in/chamathabeysinghe> >> >>> Mobile : +94752930548 >> >>> >> >> >> >> >> > >> > >> > -- >> > Chamath Abeysinghe >> > Department of Computer Science and Engineering >> > University of Moratuwa >> > Mobile: +94712803295 <+94%2071%20280%203295> >> > >> > > > > -- > Chamath Abeysinghe > Department of Computer Science and Engineering > University of Moratuwa > Mobile: +94712803295 <+94%2071%20280%203295> >