[ 
https://issues.apache.org/jira/browse/MAHOUT-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883274#action_12883274
 ] 

Hudson commented on MAHOUT-418:
-------------------------------

Integrated in Mahout-Quality #104 (See 
[http://hudson.zones.apache.org/hudson/job/Mahout-Quality/104/])
    

> Computing the pairwise similarities of the rows of a matrix
> -----------------------------------------------------------
>
>                 Key: MAHOUT-418
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-418
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>            Reporter: Sebastian Schelter
>         Attachments: MAHOUT-418-2.patch, MAHOUT-418-3.patch, MAHOUT-418.patch
>
>
> In response to the wish from MAHOUT-362 and the latest discussion on the 
> mailing list started by Kris Jack about computing a document similarity 
> matrix, I tried to generalize the approach we're already using to compute the 
> item-item-similarities for collaborative filtering.
> The job in the patch computes the pairwise similarity of the rows of a matrix 
> in a distributed manner, is uses a SequenceFile<IntWritable,VectorWritable> 
> as input and outputs such a file too. Custom similarity implementations can 
> be supplied, I've already implemented tanimoto and cosine for demo and 
> testing purposes. The algorithm is based on the one presented here: 
> http://www.umiacs.umd.edu/~jimmylin/publications/Elsayed_etal_ACL2008_short.pdf
> I'd be glad if someone could verify the applicability of this approach by 
> running it with a reasonably large input, I'm also worried that it might 
> buffer to much data in certain steps.
> If you decide to include it in mahout, some more efforts and decisions (like 
> more tests, more similarity measures, integration with DistributedRowMatrix) 
> would need to be made, I guess.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to