[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030843#comment-14030843
 ] 

ASF GitHub Bot commented on MAHOUT-1464:
----------------------------------------

GitHub user pferrel opened a pull request:

    https://github.com/apache/mahout/pull/18

    MAHOUT-1464

    The numNonZeroElementsPerColumn additions did not account for negative 
values, only counted the positive non-zero values. Fixed this in the in core 
and distributed case.
    
    I added to Functions.java to create a Functions.notEqual. It may be 
possible to do this with the other functions but it wasn't obvious so I wrote 
one. The test is in MatrixOpsSuite, where is it used.
    
    The distributed case was much simpler.
    
    Changed tests to include negative values.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pferrel/mahout mahout-1464

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/mahout/pull/18.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18
    
----
commit 107a0ba9605241653a85b113661a8fa5c055529f
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-04T19:54:22Z

    added Sebastian's CooccurrenceAnalysis patch updated it to use current 
Mahout-DSL

commit 16c03f7fa73c156859d1dba3a333ef9e8bf922b0
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-04T21:32:18Z

    added Sebastian's MurmurHash changes
    
    Signed-off-by: pferrel <p...@occamsmachete.com>

commit c6adaa44c80bba99d41600e260bbb1ad5c972e69
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-05T16:52:23Z

    MAHOUT-1464 import cleanup, minor changes to examples for running on Spark 
Cluster

commit 1d66e5726e71e297ef4a7a27331463ba363098c0
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-06T20:19:32Z

    scalatest for cooccurrence cross and self along with other 
CooccurrenceAnalyisi methods

commit 766db0f9e7feb70520fbd444afcb910788f01e76
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-06T20:20:46Z

    Merge branch 'master' into mahout-1464

commit e492976688cb8860354bb20a362d370405f560e1
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-06T20:50:07Z

    cleaned up test comments

commit a49692eb1664de4b15de1864b95701a6410c80c8
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-06T21:09:55Z

    got those cursed .DS_Stores out of the branch and put an exclude in 
.gitignore

commit 268290d28d4f83cc47a7e6baebc5eb4c53d7c8da
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-07T21:50:04Z

    Merge branch 'master' into mahout-1464

commit 63b10704390e18f513cca30596b1d25e146a6edd
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-08T15:26:36Z

    Merge branch 'master' into mahout-1464

commit ac00d7655c4cba5f6c6dcb4882be95656b17a834
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-09T14:11:43Z

    Merge branch 'master' into mahout-1464

commit fb008efeae3d5f6f6ba350fbc2ef3944da1dcaef
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-12T02:17:27Z

    added 'colCounts' to a drm using the SparkEngine and MatrixOps, which, when 
used in cooccurrence, fixes the problem with non-boolean preference values

commit 5b04cb31403e2521d9874ad5e14f28cd0af26c26
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-12T02:18:29Z

    Merge branch 'master' into mahout-1464

commit e451a2a596f5ceda8d1b4990e97ad3d5673fdb5f
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-12T16:02:26Z

    fixed some things from Dmitiy's comments, primary being the SparkEngine 
accumulator was doing >= 0 instead of > 0

commit 411e0e92b4721626b736d66c292926fa4fdbb530
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-12T17:43:21Z

    changing the name of drm.colCounts to drm.getNumNonZeroElements

commit 9655fd70f69ed97eb2d6765928a0a1f7dd760281
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-12T18:32:03Z

    meant to say changing drm.colCounts to drm.numNonZeroElementsPerColumn

commit a2001375d46c5946b671f89f5a7cff2e6a094ea8
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-12T18:34:32Z

    Merge branch 'master' into mahout-1464

commit 2db06b5566c8dcccb382733613b2fab6c223b5de
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-12T18:51:54Z

    typo

commit 0b689b8b879c4ac03b71cf504a9d0d78ffa6bfa5
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-12T20:03:45Z

    clean up test

commit 32afbe5e552ab94979dd545d14cda17ebc9c018e
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-12T23:42:08Z

    one more fat finger error

commit b91e5e98c47829a5cc099289f83e99e6bf317dd6
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-13T16:18:33Z

    did not account for negative values in the purely mathematical MatrixOps 
and SparkEngine version of numNonZeroElementsPerColumn so fixed this and added 
to tests

commit 9f6fd902f95c7daf687ecb59698f78217dbf6b6b
Author: pferrel <p...@occamsmachete.com>
Date:   2014-06-13T16:43:46Z

    merging master to run new tests

----


> Cooccurrence Analysis on Spark
> ------------------------------
>
>                 Key: MAHOUT-1464
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>         Environment: hadoop, spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
> can be used as input. 
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
> several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to