Complete minsize constraints for similarity measures used in RowSimilarityJob
-----------------------------------------------------------------------------

                 Key: MAHOUT-803
                 URL: https://issues.apache.org/jira/browse/MAHOUT-803
             Project: Mahout
          Issue Type: Task
          Components: Math
    Affects Versions: 0.6
            Reporter: Sebastian Schelter
            Assignee: Sebastian Schelter


The latest implementation of RowSimilarityJob allows specifying a threshold for 
the minimum similarity value of the resulting row pairs.

A measure can specify a minsize constraints via 
VectorSimilarityMeasure.consider(...) to prune some candidate pairs very early 
by looking at some statistics computed for the single rows.

For example if cooccurrence count is used as similarity measure and a threshold 
of 5 is set, then all row pairs where one of the vectors has less than 5 
non-zero components can be discarded.

These min-size constraints are still missing for CityBlockSimilarity, 
LoglikelihoodSimilarity and EuclideanDistanceSimilarity

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to