[ 
https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589396#action_12589396
 ] 

Samee Zahur commented on MAHOUT-42:
-----------------------------------

I think what you are doing here is exactly what I tried to hide when designing 
VectorPair in MAHOUT-34, which basically did the same thing. (maybe the 
committers were hoping for a more general solution).

In any case: Say when you call calculate(vector0,vector1), feature 0 and 2 gets 
visited. So variables are now:

{noformat}
a2 = vector0[0]^2 + vector0[2]^2
b2 = vector1[0]^2 + vector1[2]^2
{noformat}

Then when you call calculate(vector1,vector0), lets say feature 1 gets visited. 
But the method was invoked with parameters reversed, so variables now get these 
values:

{noformat}
a2 = vector0[0]^2 + vector0[2]^2 + vector1[1]^2
b2 = vector1[0]^2 + vector1[2]^2 + vector0[1]^2
{noformat}

This is probably not what you had in mind.
This is the kind of bug you would expect to show up if we have 
vector0=sparsevector{1,0,2} and vector1=sparsevector{3,3,2}
You probably expect a2=5 and b2=20, but you will get a2=14, b2=11

> Tanimoto coefficient distance measure
> -------------------------------------
>
>                 Key: MAHOUT-42
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-42
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>         Attachments: MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to