Re: cf/couccurence code

Anand Avati Wed, 09 Jul 2014 15:16:34 -0700

Pat,

I agree that proposal is not ideal, and your points are of course valid.
All I'm saying is solving the code vs test module is a separate issue, not
a non-issue. However it is independent of the "right location of cf code
problem."


Here's a PR for just the code move: https://github.com/apache/mahout/pull/26



On Wed, Jul 9, 2014 at 8:44 AM, Pat Ferrel <[email protected]> wrote:

> Hmm, that doesn’t seem like a good idea. Since there is precedence and for
> the sake of argument I’ll go ahead and do it but:
>
> 1) it means the wrong module will fail a build test when the error in not
> in the test
> 2) it is a kind of lie about the dependencies of a module. A consumer
> would think they can include only math-scala in a project but some
> ill-defined parts of it are useless without spark. So no real separation
> can be made. I understand that this is so some hypothetical future engine
> module can replace spark, but it would have to come with an awful lot of
> stuff including many of the build tests for math-scala. This only adds to
> my concern over this approach and will result in the real and current
> implementation on Spark to be misleading and confusing in it’s structure.
>
> But as I said for the sake of avoiding further argument I’ll separate impl
> from test.
>
> On Jul 8, 2014, at 6:42 PM, Anand Avati <[email protected]> wrote:
>
> If that is the case, why not commit so much already (i.e, separate modules
> for code and test) since that has been the "norm" thus far (see DSSVD,
> DSPCA etc.) Fixing code vs test modules could be a separate task/activity
> (which I'm happy to pick up) on which cf code move need not be dependent on.
>
>
> On Tue, Jul 8, 2014 at 6:14 PM, Pat Ferrel <[email protected]> wrote:
>
>> I already did the code and tests in separate modules, that works but is
>> not a good way to go imo. If there are tests that will work in math-scala
>> then we can put the code in math-scala. I couldn’t find a way to do it.
>>
>>
>> On Jul 8, 2014, at 4:40 PM, Anand Avati <[email protected]> wrote:
>>
>> I'm not completely sure how to address this (code and tests in separate
>> modules) as I write, but I will give it a shot soon.
>>
>>
>> On Mon, Jul 7, 2014 at 9:18 AM, Pat Ferrel <[email protected]> wrote:
>>
>> > OK, I’m spending more time on this than I have to spare. The test class
>> > extends MahoutLocalContext, which provides an implicit Spark context. I
>> > haven’t found a way to test parallel execution of cooccurrence without
>> it.
>> > So far the only obvious option is to put cf into math-scala but the
>> tests
>> > would have to remain in spark and that seems like trouble so I’d rather
>> not
>> > do that.
>> >
>> > I suspect as more math-scala consuming algos get implemented this issue
>> > will proliferate. We will have implementations that do not require Spark
>> > but tests that do. We could create a new sub-project that allows for
>> this I
>> > suppose but a new sub-project will require changes to SparkEngine and
>> > mahout’s script.
>> >
>> > If someone (Anand?) wants to offer a PR with some way around this I’d be
>> > happy to integrate.
>> >
>> > On Jun 30, 2014, at 5:39 PM, Pat Ferrel <[email protected]> wrote:
>> >
>> > No argument, just trying to decide whether to create core-scala or keep
>> > dumping anything not Spark dependent in math-scala.
>> >
>> > On Jun 30, 2014, at 9:32 AM, Ted Dunning <[email protected]> wrote:
>> >
>> > On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <[email protected]>
>> wrote:
>> >
>> >> Speaking for Sebastian and Dmitriy (with some ignorance) I think the
>> idea
>> >> was to isolate things with Spark dependencies something like we did
>> > before
>> >> with Hadoop.
>> >
>> >
>> > Go ahead and speak for me as well here!
>> >
>> > I think isolating the dependencies is crucial for platform nimbleness
>> > (nimbility?)
>> >
>> >
>> >
>>
>>
>
>

Re: cf/couccurence code

Reply via email to