Pat, I agree that proposal is not ideal, and your points are of course valid. All I'm saying is solving the code vs test module is a separate issue, not a non-issue. However it is independent of the "right location of cf code problem."
Here's a PR for just the code move: https://github.com/apache/mahout/pull/26 On Wed, Jul 9, 2014 at 8:44 AM, Pat Ferrel <[email protected]> wrote: > Hmm, that doesn’t seem like a good idea. Since there is precedence and for > the sake of argument I’ll go ahead and do it but: > > 1) it means the wrong module will fail a build test when the error in not > in the test > 2) it is a kind of lie about the dependencies of a module. A consumer > would think they can include only math-scala in a project but some > ill-defined parts of it are useless without spark. So no real separation > can be made. I understand that this is so some hypothetical future engine > module can replace spark, but it would have to come with an awful lot of > stuff including many of the build tests for math-scala. This only adds to > my concern over this approach and will result in the real and current > implementation on Spark to be misleading and confusing in it’s structure. > > But as I said for the sake of avoiding further argument I’ll separate impl > from test. > > On Jul 8, 2014, at 6:42 PM, Anand Avati <[email protected]> wrote: > > If that is the case, why not commit so much already (i.e, separate modules > for code and test) since that has been the "norm" thus far (see DSSVD, > DSPCA etc.) Fixing code vs test modules could be a separate task/activity > (which I'm happy to pick up) on which cf code move need not be dependent on. > > > On Tue, Jul 8, 2014 at 6:14 PM, Pat Ferrel <[email protected]> wrote: > >> I already did the code and tests in separate modules, that works but is >> not a good way to go imo. If there are tests that will work in math-scala >> then we can put the code in math-scala. I couldn’t find a way to do it. >> >> >> On Jul 8, 2014, at 4:40 PM, Anand Avati <[email protected]> wrote: >> >> I'm not completely sure how to address this (code and tests in separate >> modules) as I write, but I will give it a shot soon. >> >> >> On Mon, Jul 7, 2014 at 9:18 AM, Pat Ferrel <[email protected]> wrote: >> >> > OK, I’m spending more time on this than I have to spare. The test class >> > extends MahoutLocalContext, which provides an implicit Spark context. I >> > haven’t found a way to test parallel execution of cooccurrence without >> it. >> > So far the only obvious option is to put cf into math-scala but the >> tests >> > would have to remain in spark and that seems like trouble so I’d rather >> not >> > do that. >> > >> > I suspect as more math-scala consuming algos get implemented this issue >> > will proliferate. We will have implementations that do not require Spark >> > but tests that do. We could create a new sub-project that allows for >> this I >> > suppose but a new sub-project will require changes to SparkEngine and >> > mahout’s script. >> > >> > If someone (Anand?) wants to offer a PR with some way around this I’d be >> > happy to integrate. >> > >> > On Jun 30, 2014, at 5:39 PM, Pat Ferrel <[email protected]> wrote: >> > >> > No argument, just trying to decide whether to create core-scala or keep >> > dumping anything not Spark dependent in math-scala. >> > >> > On Jun 30, 2014, at 9:32 AM, Ted Dunning <[email protected]> wrote: >> > >> > On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <[email protected]> >> wrote: >> > >> >> Speaking for Sebastian and Dmitriy (with some ignorance) I think the >> idea >> >> was to isolate things with Spark dependencies something like we did >> > before >> >> with Hadoop. >> > >> > >> > Go ahead and speak for me as well here! >> > >> > I think isolating the dependencies is crucial for platform nimbleness >> > (nimbility?) >> > >> > >> > >> >> > >
