[
https://issues.apache.org/jira/browse/MAHOUT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589396#action_12589396
]
Samee Zahur commented on MAHOUT-42:
-----------------------------------
I think what you are doing here is exactly what I tried to hide when designing
VectorPair in MAHOUT-34, which basically did the same thing. (maybe the
committers were hoping for a more general solution).
In any case: Say when you call calculate(vector0,vector1), feature 0 and 2 gets
visited. So variables are now:
{noformat}
a2 = vector0[0]^2 + vector0[2]^2
b2 = vector1[0]^2 + vector1[2]^2
{noformat}
Then when you call calculate(vector1,vector0), lets say feature 1 gets visited.
But the method was invoked with parameters reversed, so variables now get these
values:
{noformat}
a2 = vector0[0]^2 + vector0[2]^2 + vector1[1]^2
b2 = vector1[0]^2 + vector1[2]^2 + vector0[1]^2
{noformat}
This is probably not what you had in mind.
This is the kind of bug you would expect to show up if we have
vector0=sparsevector{1,0,2} and vector1=sparsevector{3,3,2}
You probably expect a2=5 and b2=20, but you will get a2=14, b2=11
> Tanimoto coefficient distance measure
> -------------------------------------
>
> Key: MAHOUT-42
> URL: https://issues.apache.org/jira/browse/MAHOUT-42
> Project: Mahout
> Issue Type: New Feature
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Attachments: MAHOUT-42.txt
>
>
> http://en.wikipedia.org/wiki/Jaccard_index
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.