Hi, Thanks for the tip. I solved the bug and opened a new PR for SYSTEMML-2169.
Regards, Chamath On Wed, Mar 14, 2018 at 7:42 AM, Matthias Boehm <mboe...@gmail.com> wrote: > ---------- Forwarded message ---------- > From: Matthias Boehm <mboe...@gmail.com> > Date: Tue, Mar 13, 2018 at 1:00 PM > Subject: Re: Extending Codegen algorithm tests for heuristics > To: Chamath Abeysinghe <abeysinghecham...@gmail.com> > > > without debugging it's hard to tell, but usually something like this > happens if blocks are incorrectly aligned. So I would recommend to simply > do a mapToPair before the RDDAggregateUtils.mergeByKey and validate the > correctness of shifted block indexes. Maybe the newly introduced broadcasts > are not shifted into their target positions? For example, consider > cbind(A,B) - before aggregation, B needs to be shifted by ncol(A). > Furthermore, it would be great to avoid unnecessary aggregation if all but > one inputs are broadcasts. > > Regards, > Matthias > > On Tue, Mar 13, 2018 at 7:36 AM, Chamath Abeysinghe < > abeysinghecham...@gmail.com> wrote: > > > Hi Matthias, > > I am working on SYSTEMML-2169 issue. I have sent a partially completed PR > > ( https://github.com/apache/systemml/pull/747 ). After those changes, > > some test cases in NaryRBindTest, are failing and I could not understand > > the reason. > > Test cases are failing with following error > > *Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched > block > > sizes for: 280 101 1000 101 * > > * at > > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$ > MergeBlocksFunction.call(RDDAggregateUtils.java:622)* > > * at > > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$ > MergeBlocksFunction.call(RDDAggregateUtils.java:596)* > > > > Even after debugging the whole process I could not find a reason for > this. > > If you can give any suggestion that would be really helpful. > > > > If you have any other comment regarding the PR I could modify code > > according to that. > > > > Thanks, > > Chamath > > > > > > On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <mboe...@gmail.com> > wrote: > > > >> ---------- Forwarded message ---------- > >> From: Matthias Boehm <mboe...@gmail.com> > >> Date: Tue, Mar 6, 2018 at 10:14 PM > >> Subject: Re: Extending Codegen algorithm tests for heuristics > >> To: Chamath Abeysinghe <abeysinghecham...@gmail.com> > >> > >> > >> Hi Chamath, > >> > >> great thanks for your contribution - I left a couple of comments but we > >> should be ready to merge this in soon. If you want to get a better > feeling > >> for the distributed spark backend as well, I created SYSTEMML-2169, > which > >> aims to extend our recently added nary cbind/rbind operations to > leverage > >> broadcasts when applicable. > >> > >> Regarding the proposal, most of the backends are rather independent, but > >> each backend depends on the language integration. We will help out where > >> necessary. So it depends on your interests and ideas. If you're more > >> interested in defining the language APIs, make this and a simple backend > >> the core of your proposal. If you're more interested in the runtime > >> backends, I would help and add a basic language integration in time, > which > >> would allow you to immediate start working on the backends. > >> > >> Following the GSoC guidelines it's usually better to underscope the > >> project > >> than overscope it because you want to ensure that you're able to > >> successfully complete the project in the ambitious timeframe and there > >> will > >> always be unforeseen obstacles. I would recommend to define a core > project > >> and potential extensions you will address if time allows. For example, > the > >> local, multi-threaded backend can indeed be realized relatively quickly. > >> However, subsequently we can add and experiment with Hogwild! (i.e., > >> unsynchronized updates) which is known to work well for sparse models, > >> replication and partitioning in NUMA settings, and potentially the > >> automatic selection of update strategies. > >> > >> Regards, > >> Matthias > >> > >> > >> On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe < > >> abeysinghecham...@gmail.com> wrote: > >> > >> > Hi, > >> > I have sent a pull request for this issue. > >> > As a next step, could you suggest any new issue? or anything I have to > >> do > >> > to familiarize with Language and run time for parameter servers > project. > >> > > >> > And regarding writing the project proposal I have few questions. > >> > * In the epic there are few sub tasks, is it enough to focus on a > single > >> > task through out the summer? Would it have enough work load or should > I > >> go > >> > for multiple tasks? > >> > * What is the linkage between sub tasks? Do tasks like, Distributed > >> Spark > >> > Back-end or Local multi threaded back ends; need previous tasks > >> completed > >> > before starting work? > >> > > >> > I am glad if you could suggests some issues related to Distributed > spark > >> > back-end or multi threaded backend tasks. > >> > > >> > Thanks. > >> > Regards, > >> > Chamath > >> > > >> > > >> > On Fri, Mar 2, 2018 at 6:46 AM, Matthias Boehm <mboe...@gmail.com> > >> wrote: > >> > > >> >> Hi Chamath, > >> >> > >> >> in general, you're absolutely right - you can enable -stats and > >> >> programmatically probe the heavy hitter statistics for certain > opcodes. > >> >> However, uamin and uamax stand for "unary aggregate minimum" and > "unary > >> >> aggregation maximum" which correspond to min(X) and max(X) on script > >> level. > >> >> Instead all generated fused operators are prefixed with spoof or > >> sp_spoof > >> >> (for distributed spark operations). The related junit assertion > should > >> >> already be in the existing tests, I just mentioned it for > completeness. > >> >> > >> >> Regards, > >> >> Matthias > >> >> > >> >> On Thu, Mar 1, 2018 at 4:30 AM, Chamath Abeysinghe < > >> >> abeysinghecham...@gmail.com> wrote: > >> >> > >> >>> Thanks for your detailed reply. > >> >>> I did some coding > >> >>> <https://github.com/apache/systemml/compare/master...chamath > >> abeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159> > >> >>> [1] for this issue SYSTEMML-2159 to extend test cases for FA & FNR . > >> I got > >> >>> a problem regarding success criteria's, "generating at least one > fused > >> >>> operator" condition, I think this means I have to look into stats of > >> Heavy > >> >>> hitter instructions and check if there are any fused operators. (my > >> guess > >> >>> is uamin and uamax are the operators what I have to look for, but I > >> am not > >> >>> sure about this because I don't know the meaning of these > >> instructions). > >> >>> > >> >>> Please help me to clarify this. If my approach is correct I could > >> send a > >> >>> PR after fixing tests for other algorithms. Thanks. > >> >>> > >> >>> Regards, > >> >>> Chamath > >> >>> > >> >>> [1] https://github.com/apache/systemml/compare/master...cham > >> >>> athabeysinghe:SYSTEMML-2159?diff=split&name=SYSTEMML-2159 > >> >>> > >> >>> > >> >>> On Tue, Feb 27, 2018 at 1:54 AM, Matthias Boehm <mboe...@gmail.com> > >> >>> wrote: > >> >>> > >> >>>> ---------- Forwarded message ---------- > >> >>>> From: Matthias Boehm <mboe...@gmail.com> > >> >>>> Date: Mon, Feb 26, 2018 at 11:59 AM > >> >>>> Subject: Re: Extending Codegen algorithm tests for heuristics > >> >>>> To: Chamath Abeysinghe <abeysinghecham...@gmail.com> > >> >>>> > >> >>>> > >> >>>> great - thanks for taking this over Chamath. > >> >>>> > >> >>>> In general, I would recommend to use this task to explore SystemML > a > >> >>>> little. For example, take one of the codegen algorithm tests from > >> >>>> org.apache.sysml.test.integration.functions.codegenalg (e.g., > >> >>>> AlgorithmL2SVM) and pass different flags such as -stats, -explain, > >> >>>> -explain > >> >>>> recompile_hops, -explain recompile_runtime to programArgs and try > to > >> >>>> understand the output. If you come over specific questions, please > >> just > >> >>>> ask. > >> >>>> > >> >>>> To answer your detailed questions: > >> >>>> > >> >>>> 1) We recently added a code generation framework that automatically > >> >>>> identifies opportunities for fused operators and subsequently > >> generates > >> >>>> code for these operators. A major part is the selection of fusion > >> plans, > >> >>>> for which we provide heuristics and a cost-based optimizer. By > >> default > >> >>>> (and > >> >>>> thus also in our testsuite), we use the cost-based optimizer, but > it > >> >>>> would > >> >>>> be good regularly test the heuristics as well. > >> >>>> > >> >>>> 2) You can configure the used optimizer in your SystemML-config.xml > >> >>>> file as > >> >>>> follows: > >> >>>> <sysml.codegen.optimizer>fuse_all</sysml.codegen.optimizer> > >> >>>> Valid alternatives are: fuse_all, fuse_no_redundancy, > >> fuse_cost_based, > >> >>>> and > >> >>>> fuse_cost_based_v2 (default). You can provide alternative config > xml > >> >>>> files > >> >>>> and switch them dynamically via getConfigTemplateFile. > >> >>>> > >> >>>> 3) Similar to the existing tests, it needs to (1) run without > errors, > >> >>>> (2) > >> >>>> produce correct results as compared to R, and (3) generate at least > >> one > >> >>>> fused operator. > >> >>>> > >> >>>> Regards, > >> >>>> Matthias > >> >>>> > >> >>>> On Mon, Feb 26, 2018 at 6:54 AM, Chamath Abeysinghe < > >> >>>> abeysinghecham...@gmail.com> wrote: > >> >>>> > >> >>>> > Hi All, > >> >>>> > As per the guidelines given to GSoC students, I would like to > work > >> on > >> >>>> the > >> >>>> > SYSTEMML-2159 [1] issue as a starting point. But I don't > understand > >> >>>> the > >> >>>> > background of the issue. Can someone help me with understanding > the > >> >>>> context > >> >>>> > of this issue? > >> >>>> > > >> >>>> > Few problems I got are, > >> >>>> > > >> >>>> > 1) What are fusion heuristics, fuse-all and fuse-no-redundancy? > >> >>>> > 2) Can I pass those heuristic related configurations as args to > >> >>>> execute > >> >>>> > DMLScript? > >> >>>> > 3) What is the success criteria for a test that use those > >> heuristics? > >> >>>> > > >> >>>> > Thank you in advance > >> >>>> > > >> >>>> > Regards, > >> >>>> > Chamath > >> >>>> > > >> >>>> > [1] https://issues.apache.org/jira/browse/SYSTEMML-2159 > >> >>>> > > >> >>>> > -- > >> >>>> > Chamath Abeysinghe > >> >>>> > Department of Computer Science and Engineering > >> >>>> > University of Moratuwa > >> >>>> > <https://www.facebook.com/chamath.abeysinghe.3> [image: > >> >>>> > https://www.linkedin.com/in/kaushalya-gayan-batawala-bbb5927 > >> >>>> a?trk=hp-identity-name] > >> >>>> > <https://lk.linkedin.com/in/chamathabeysinghe> > >> >>>> > Mobile : +94752930548 > >> >>>> > > >> >>>> > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Chamath Abeysinghe > >> >>> Department of Computer Science and Engineering > >> >>> University of Moratuwa > >> >>> <https://www.facebook.com/chamath.abeysinghe.3> [image: > >> >>> https://www.linkedin.com/in/kaushalya-gayan-batawala-bbb5927 > >> a?trk=hp-identity-name] > >> >>> <https://lk.linkedin.com/in/chamathabeysinghe> > >> >>> Mobile : +94752930548 > >> >>> > >> >> > >> >> > >> > > >> > > >> > -- > >> > Chamath Abeysinghe > >> > Department of Computer Science and Engineering > >> > University of Moratuwa > >> > Mobile: +94712803295 <+94%2071%20280%203295> > >> > > >> > > > > > > > > -- > > Chamath Abeysinghe > > Department of Computer Science and Engineering > > University of Moratuwa > > Mobile: +94712803295 <+94%2071%20280%203295> > > > -- Chamath Abeysinghe Department of Computer Science and Engineering University of Moratuwa Mobile: +94712803295