Complete minsize constraints for similarity measures used in RowSimilarityJob
-----------------------------------------------------------------------------
Key: MAHOUT-803
URL: https://issues.apache.org/jira/browse/MAHOUT-803
Project: Mahout
Issue Type: Task
Components: Math
Affects Versions: 0.6
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
The latest implementation of RowSimilarityJob allows specifying a threshold for
the minimum similarity value of the resulting row pairs.
A measure can specify a minsize constraints via
VectorSimilarityMeasure.consider(...) to prune some candidate pairs very early
by looking at some statistics computed for the single rows.
For example if cooccurrence count is used as similarity measure and a threshold
of 5 is set, then all row pairs where one of the vectors has less than 5
non-zero components can be discarded.
These min-size constraints are still missing for CityBlockSimilarity,
LoglikelihoodSimilarity and EuclideanDistanceSimilarity
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira