[ 
https://issues.apache.org/jira/browse/MAHOUT-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466480#comment-13466480
 ] 

Ted Dunning commented on MAHOUT-1086:
-------------------------------------

OK.  I seem to be able replicate the problem with 

    trunk@1380432 MAHOUT-1059 - Abstract the idea of a cached length
    1cb76f01b9b504fdf33a7ef6e30afdbd7d3842ef

and not before.  This change also introduces some changes to AbstractVector 
that might be the issue.

The changes involved have to do with whether operations on sparse matrices 
operated sparsely.  For instance, like()
used to be this:
{code}
    Vector result = like().assign(this);
{code}
This causes a dense iteration which is wrong.  The new code does this instead:
{code}
    Vector result;
    if (isDense()) {
      result = like().assign(this);
    } else {
      result = like();
      Iterator<Element> i = this.iterateNonZero();
      while (i.hasNext()) {
        final Element element = i.next();
        result.setQuick(element.index(), element.get());
      }
    }
{code}
The idea is that if the source of the data is sparse, we only need to assign 
the non-zero elements since we know the newly minted destination will be zero 
filled.

My feeling is that this code is correct, but there is a more complex change 
later in the same diff that might have changed some results.

I will isolate these changes and see if I can determine what the changes were 
and how they impact canopy stuff.  
                
> Mean Shift Test Now Produces 4 Clusters
> ---------------------------------------
>
>                 Key: MAHOUT-1086
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1086
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Jeff Eastman
>
> Something changed in Mahout around 9/6/12 that caused 
> TestMeanShift.testCanopyEuclideanMRJobNoClustering to return 4 clusters 
> rather than 3. All of the other tests using the same data still return 3 
> clusters. No changes were made to any of the MeanShiftCanopy classes other 
> than 1 formatting change to the driver so I'm at a loss to the cause.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to