[ 
https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836624#action_12836624
 ] 

Robin Anil commented on MAHOUT-300:
-----------------------------------

I think the irregularity is due to the sparse vector generation process where 
duplicate index values could get generated leaving some vectors much sparser 
than the sparsity value

{code}
      Vector v = new SequentialAccessSparseVector(cardinality, sparsity); // 
sparsity!
      int[] indexes = new int[sparsity];
      double[] values = new double[sparsity];
      for (int j = 0; j < sparsity; j++) {
        double value = r.nextGaussian();
        int index = sparsity < cardinality ? r.nextInt(cardinality) : j;
        v.set(index, value);
        indexes[j] = index;
        values[j] = value;
      }
{code}

instead i suggest this

{code}
      Vector v = new SequentialAccessSparseVector(cardinality, sparsity); // 
sparsity!
      boolean[] featureSpace = new boolean[cardinality];
      int[] indexes = new int[sparsity];
      double[] values = new double[sparsity];
      int j = 0;
      while(j < sparsity) {
        double value = r.nextGaussian();
        int index = r.nextInt(cardinality);
        if(featureSpace[index] == false) {
          featureSpace[index] = true;
          indexes[j] = index;
          values[j++] = value;
          v.set(index, value);
        }
      }
{code}

> Solve performance issues with Vector Implementations
> ----------------------------------------------------
>
>                 Key: MAHOUT-300
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-300
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-300.patch, MAHOUT-300.patch, MAHOUT-300.patch, 
> MAHOUT-300.patch, MAHOUT-300.patch, MAHOUT-300.patch
>
>
> AbstractVector operations like times
>   public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       int index = element.index();
>       result.setQuick(index, element.get() * x);
>     }
>     return result;
>   }
> should be implemented as follows
>  public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = result.iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       element.set(element.get() * x);
>     }
>     return result;
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to