[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025549#comment-14025549
 ] 

Pat Ferrel commented on MAHOUT-1464:
------------------------------------

While I was waiting for the build to settle down I wrote some more tests for 
different value types. The same row/column is used for each input so all the 
LLR indicator matrices should be the same and are using the Hadoop code. But 
using integers of larger than 1 magnitude returns an empty indicator matrix.

input:

    val a = dense((1000, 10, 0, 0, 0), (0, 0, 10000, 10, 0), (0, 0, 0, 0, 100), 
(10000, 0, 0, 1000, 0))

should produce

    val matrixLLRCoocAtAControl = dense(
      (0.0, 1.7260924347106847, 0, 0, 0),
      (1.7260924347106847, 0, 0, 0, 0),
      (0, 0, 0, 1.7260924347106847, 0),
      (0, 0, 1.7260924347106847, 0, 0),
      (0, 0, 0, 0, 0)
    )

however the following gets an empty matrix returned.

    val drmCooc = CooccurrenceAnalysis.cooccurrences(drmARaw = drmA, drmBs = 
Array(drmB))
    //var cp = drmSelfCooc(0).checkpoint()
    //cp.writeDRM("/tmp/cooc-spark/")//to get values written
    val matrixSelfCooc = drmCooc(0).checkpoint().collect

matrixSelfCooc is always empty. Took the same input to the Mahout version using 
LLR and it produces the correct result == matrixLLRCoocAtAControl.

Still investigating why this happens.

> Cooccurrence Analysis on Spark
> ------------------------------
>
>                 Key: MAHOUT-1464
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>         Environment: hadoop, spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
> can be used as input. 
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
> several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to