[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030843#comment-14030843 ]
ASF GitHub Bot commented on MAHOUT-1464: ---------------------------------------- GitHub user pferrel opened a pull request: https://github.com/apache/mahout/pull/18 MAHOUT-1464 The numNonZeroElementsPerColumn additions did not account for negative values, only counted the positive non-zero values. Fixed this in the in core and distributed case. I added to Functions.java to create a Functions.notEqual. It may be possible to do this with the other functions but it wasn't obvious so I wrote one. The test is in MatrixOpsSuite, where is it used. The distributed case was much simpler. Changed tests to include negative values. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pferrel/mahout mahout-1464 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/18.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18 ---- commit 107a0ba9605241653a85b113661a8fa5c055529f Author: pferrel <p...@occamsmachete.com> Date: 2014-06-04T19:54:22Z added Sebastian's CooccurrenceAnalysis patch updated it to use current Mahout-DSL commit 16c03f7fa73c156859d1dba3a333ef9e8bf922b0 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-04T21:32:18Z added Sebastian's MurmurHash changes Signed-off-by: pferrel <p...@occamsmachete.com> commit c6adaa44c80bba99d41600e260bbb1ad5c972e69 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-05T16:52:23Z MAHOUT-1464 import cleanup, minor changes to examples for running on Spark Cluster commit 1d66e5726e71e297ef4a7a27331463ba363098c0 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-06T20:19:32Z scalatest for cooccurrence cross and self along with other CooccurrenceAnalyisi methods commit 766db0f9e7feb70520fbd444afcb910788f01e76 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-06T20:20:46Z Merge branch 'master' into mahout-1464 commit e492976688cb8860354bb20a362d370405f560e1 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-06T20:50:07Z cleaned up test comments commit a49692eb1664de4b15de1864b95701a6410c80c8 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-06T21:09:55Z got those cursed .DS_Stores out of the branch and put an exclude in .gitignore commit 268290d28d4f83cc47a7e6baebc5eb4c53d7c8da Author: pferrel <p...@occamsmachete.com> Date: 2014-06-07T21:50:04Z Merge branch 'master' into mahout-1464 commit 63b10704390e18f513cca30596b1d25e146a6edd Author: pferrel <p...@occamsmachete.com> Date: 2014-06-08T15:26:36Z Merge branch 'master' into mahout-1464 commit ac00d7655c4cba5f6c6dcb4882be95656b17a834 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-09T14:11:43Z Merge branch 'master' into mahout-1464 commit fb008efeae3d5f6f6ba350fbc2ef3944da1dcaef Author: pferrel <p...@occamsmachete.com> Date: 2014-06-12T02:17:27Z added 'colCounts' to a drm using the SparkEngine and MatrixOps, which, when used in cooccurrence, fixes the problem with non-boolean preference values commit 5b04cb31403e2521d9874ad5e14f28cd0af26c26 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-12T02:18:29Z Merge branch 'master' into mahout-1464 commit e451a2a596f5ceda8d1b4990e97ad3d5673fdb5f Author: pferrel <p...@occamsmachete.com> Date: 2014-06-12T16:02:26Z fixed some things from Dmitiy's comments, primary being the SparkEngine accumulator was doing >= 0 instead of > 0 commit 411e0e92b4721626b736d66c292926fa4fdbb530 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-12T17:43:21Z changing the name of drm.colCounts to drm.getNumNonZeroElements commit 9655fd70f69ed97eb2d6765928a0a1f7dd760281 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-12T18:32:03Z meant to say changing drm.colCounts to drm.numNonZeroElementsPerColumn commit a2001375d46c5946b671f89f5a7cff2e6a094ea8 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-12T18:34:32Z Merge branch 'master' into mahout-1464 commit 2db06b5566c8dcccb382733613b2fab6c223b5de Author: pferrel <p...@occamsmachete.com> Date: 2014-06-12T18:51:54Z typo commit 0b689b8b879c4ac03b71cf504a9d0d78ffa6bfa5 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-12T20:03:45Z clean up test commit 32afbe5e552ab94979dd545d14cda17ebc9c018e Author: pferrel <p...@occamsmachete.com> Date: 2014-06-12T23:42:08Z one more fat finger error commit b91e5e98c47829a5cc099289f83e99e6bf317dd6 Author: pferrel <p...@occamsmachete.com> Date: 2014-06-13T16:18:33Z did not account for negative values in the purely mathematical MatrixOps and SparkEngine version of numNonZeroElementsPerColumn so fixed this and added to tests commit 9f6fd902f95c7daf687ecb59698f78217dbf6b6b Author: pferrel <p...@occamsmachete.com> Date: 2014-06-13T16:43:46Z merging master to run new tests ---- > Cooccurrence Analysis on Spark > ------------------------------ > > Key: MAHOUT-1464 > URL: https://issues.apache.org/jira/browse/MAHOUT-1464 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Environment: hadoop, spark > Reporter: Pat Ferrel > Assignee: Pat Ferrel > Fix For: 1.0 > > Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, > MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh > > > Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that > runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM > can be used as input. > Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has > several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)