[
https://issues.apache.org/jira/browse/MAHOUT-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598772#comment-13598772
]
Hudson commented on MAHOUT-1019:
--------------------------------
Integrated in Mahout-Quality #1894 (See
[https://builds.apache.org/job/Mahout-Quality/1894/])
MAHOUT-1019 VectorDistanceSimilarityJob (Revision 1455095)
Result = SUCCESS
ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455095
Files :
*
/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/VectorDistanceMapper.java
*
/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/VectorDistanceSimilarityJob.java
*
/mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/TestVectorDistanceSimilarityJob.java
> VectorDistanceSimilarityJob
> ---------------------------
>
> Key: MAHOUT-1019
> URL: https://issues.apache.org/jira/browse/MAHOUT-1019
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.8
> Environment: all
> Reporter: Timothy Potter
> Priority: Minor
> Labels: VectorDistanceSimilarityJob, distance, vector
> Attachments: MAHOUT-1019.patch
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> The VectorDistanceSimilarityJob is a fantastic tool, but poses the risk of
> creating terabytes of output of dubious value. For example, I have ~10K seed
> vectors and millions of vectors to compute the similarity between so I would
> like to add an optional parameter to this job to specify a maximum distance
> threshold that prevents any distances above this value from being written to
> the output. The default would be 1.0d so no filtering is applied which
> ensures backwards compatibility, but if supplied, only rows where the
> distance is less than the threshold would be output from the mapper. This can
> help reduce the storage requirements of the output immensely. Probably name
> the parameter something like: noOutputIfDistanceGreaterThan
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira