[ 
https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836370#action_12836370
 ] 

Robin Anil commented on MAHOUT-300:
-----------------------------------

ok. Made maxValue and maxValueIndex as per your comments. Only difference in 
behaviour is for random access it could return any index which is zero as the 
max value as things are not ordered.

I have modified dot and minus, a frequently used functions in distance measures 
and optimised them as follows

{code}
  public double dot(Vector x) {
    if (size() != x.size()) {
      throw new CardinalityException(size(), x.size());
    }
    double result = 0;
    if (this instanceof SequentialAccessSparseVector
        && x instanceof SequentialAccessSparseVector) {
      // For sparse SeqAccVectors. do dot product without lookup in a linear 
fashion
      Iterator<Element> myIter = iterateNonZero();
      Iterator<Element> otherIter = x.iterateNonZero();
      Element myCurrent = null;
      Element otherCurrent = null;
      while (myIter.hasNext() && otherIter.hasNext()) {
        if (myCurrent == null) myCurrent = myIter.next();
        if (otherCurrent == null) otherCurrent = otherIter.next();
        
        int myIndex = myCurrent.index();
        int otherIndex = otherCurrent.index();
        
        if (myIndex < otherIndex) {
          // due to the sparseness skipping occurs more hence checked before 
equality
          myCurrent = null;
        } else if (myIndex > otherIndex){
          otherCurrent = null;
        } else { // both are equal 
          result += myCurrent.get() * otherCurrent.get();
          myCurrent = null;
          otherCurrent = null;
        } 
      }
      return result;
    } else if ((this instanceof RandomAccessSparseVector || this instanceof 
DenseVector)
               && (x instanceof SequentialAccessSparseVector || x instanceof 
RandomAccessSparseVector)) {
      // Try to get the speed boost associated fast/normal seq access on x and 
quick lookup on this
      Iterator<Element> iter = x.iterateNonZero();
      while (iter.hasNext()) {
        Element element = iter.next();
        result += element.get() * getQuick(element.index());
      }
      return result;
    } else { // TODO: can optimize more based on the numDefaultElements in the 
vectors
      Iterator<Element> iter = iterateNonZero();
      while (iter.hasNext()) {
        Element element = iter.next();
        result += element.get() * x.getQuick(element.index());
      }
      return result;
    }
  }

  public Vector minus(Vector x) {
    if (size() != x.size()) {
      throw new CardinalityException();
    }
    if (x instanceof RandomAccessSparseVector || x instanceof DenseVector) {
      Vector result = x.clone();
      Iterator<Element> iter = iterateNonZero();
      while (iter.hasNext()) {
        Element e = iter.next();
        result.setQuick(e.index(), result.getQuick(e.index()) - e.get());
      }
      return result;
    } else { // TODO: check the numNonDefault elements to further optimize 
      Vector result = clone();
      Iterator<Element> iter = x.iterateNonZero();
      while (iter.hasNext()) {
        Element e = iter.next();
        result.setQuick(e.index(), getQuick(e.index()) - e.get());
      }
      return result;
    }
  }

{code}

Based on all these optimisation, the before and after picture. Note: these are 
same impl benchmarks seq.dot(seq) etc.
{noformat}
BenchMarks                    DenseVector                   
RandomAccessSparseVector      SequentialAccessSparseVector
DotProduct                                                                      
                                        
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 0.132436s;          sumTime = 
1.354725s;          sumTime = 1.78453s;           
                              minTime = 0.0050ms;           minTime = 0.053ms;  
          minTime = 0.083ms;            
                              maxTime = 2.996ms;            maxTime = 54.293ms; 
          maxTime = 8.921ms;            
                              meanTime = 0.006621ms;        meanTime = 
0.067736ms;        meanTime = 0.089226ms;        
                              stdDevTime = 0.029368ms;      stdDevTime = 
0.417954ms;      stdDevTime = 0.078909ms;      
                              Speed = 151016.34 /sec        Speed = 14763.144 
/sec        Speed = 11207.433 /sec        
                              Rate = 1812.1962 MB/s         Rate = 177.15773 
MB/s         Rate = 134.4892 MB/s          
DotProduct                                                                      
                                        
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 0.127648s;          sumTime = 
1.076684s;          sumTime = 0.28348s;           
                              minTime = 0.0050ms;           minTime = 0.043ms;  
          minTime = 0.0090ms;           
                              maxTime = 1.101ms;            maxTime = 20.54ms;  
          maxTime = 4.556ms;            
                              meanTime = 0.006382ms;        meanTime = 
0.053834ms;        meanTime = 0.014174ms;        
                              stdDevTime = 0.015686ms;      stdDevTime = 
0.207212ms;      stdDevTime = 0.067221ms;      
                              Speed = 156680.88 /sec        Speed = 18575.55 
/sec         Speed = 70551.71 /sec         
                              Rate = 1880.1705 MB/s         Rate = 222.90663 
MB/s         Rate = 846.6206 MB/s          

org.apache.mahout.common.distance.CosineDistanceMeasure                         
                                                                 
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 11.119366s;         sumTime = 
33.521986s;         sumTime = 30.009782s;         
                              minTime = 0.522ms;            minTime = 1.552ms;  
          minTime = 1.426ms;            
                              maxTime = 7.088ms;            maxTime = 47.688ms; 
          maxTime = 19.84ms;            
                              meanTime = 0.555968ms;        meanTime = 
1.676099ms;        meanTime = 1.500489ms;        
                              stdDevTime = 0.090421ms;      stdDevTime = 
0.503179ms;      stdDevTime = 0.168377ms;      
                              Speed = 1798.6637 /sec        Speed = 596.62335 
/sec        Speed = 666.44934 /sec        
                              Rate = 21.583965 MB/s         Rate = 7.1594806 
MB/s         Rate = 7.9973927 MB/s         
org.apache.mahout.common.distance.CosineDistanceMeasure                         
                                                                 
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 1.555915s;          sumTime = 
9.765366s;          sumTime = 2.130194s;          
                              minTime = 0.072ms;            minTime = 0.443ms;  
          minTime = 0.098ms;            
                              maxTime = 7.128ms;            maxTime = 9.95ms;   
          maxTime = 3.214ms;            
                              meanTime = 0.077795ms;        meanTime = 
0.488268ms;        meanTime = 0.106509ms;        
                              stdDevTime = 0.059724ms;      stdDevTime = 
0.163519ms;      stdDevTime = 0.046013ms;      
                              Speed = 12854.172 /sec        Speed = 2048.0544 
/sec        Speed = 9388.815 /sec         
                              Rate = 154.25008 MB/s         Rate = 24.576653 
MB/s         Rate = 112.665794 MB/s

org.apache.mahout.common.distance.EuclideanDistanceMeasure                      
                                                                    
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 4.756927s;          sumTime = 
32.704616s;         sumTime = 16.252285s;         
                              minTime = 0.203ms;            minTime = 1.467ms;  
          minTime = 0.75ms;             
                              maxTime = 2.482ms;            maxTime = 13.63ms;  
          maxTime = 3.763ms;            
                              meanTime = 0.237846ms;        meanTime = 
1.63523ms;         meanTime = 0.812614ms;        
                              stdDevTime = 0.083861ms;      stdDevTime = 
0.357357ms;      stdDevTime = 0.095544ms;      
                              Speed = 4204.395 /sec         Speed = 611.5344 
/sec         Speed = 1230.5962 /sec        
                              Rate = 50.452744 MB/s         Rate = 7.3384137 
MB/s         Rate = 14.767155 MB/s         
org.apache.mahout.common.distance.EuclideanDistanceMeasure                      
                                                                    
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 1.590073s;          sumTime = 
9.936997s;          sumTime = 2.142341s;          
                              minTime = 0.073ms;            minTime = 0.442ms;  
          minTime = 0.098ms;            
                              maxTime = 9.248ms;            maxTime = 13.526ms; 
          maxTime = 12.412ms;           
                              meanTime = 0.079503ms;        meanTime = 
0.496849ms;        meanTime = 0.107117ms;        
                              stdDevTime = 0.07321ms;       stdDevTime = 
0.185974ms;      stdDevTime = 0.100796ms;      
                              Speed = 12578.039 /sec        Speed = 2012.6804 
/sec        Speed = 9335.582 /sec         
                              Rate = 150.93648 MB/s         Rate = 24.152166 
MB/s         Rate = 112.026985 MB/s     

org.apache.mahout.common.distance.ManhattanDistanceMeasure                      
                                                                    
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 3.463162s;          sumTime = 
30.782008s;         sumTime = 38.617236s;         
                              minTime = 0.133ms;            minTime = 1.289ms;  
          minTime = 1.749ms;            
                              maxTime = 19.493ms;           maxTime = 44.531ms; 
          maxTime = 13.406ms;           
                              meanTime = 0.173158ms;        meanTime = 
1.5391ms;          meanTime = 1.930861ms;        
                              stdDevTime = 0.204057ms;      stdDevTime = 
0.496925ms;      stdDevTime = 0.302304ms;      
                              Speed = 5775.069 /sec         Speed = 649.7302 
/sec         Speed = 517.90344 /sec        
                              Rate = 69.30083 MB/s          Rate = 7.796763 
MB/s          Rate = 6.214842 MB/s          
org.apache.mahout.common.distance.ManhattanDistanceMeasure                      
                                                                    
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 3.210922s;          sumTime = 
26.983053s;         sumTime = 34.154993s;         
                              minTime = 0.124ms;            minTime = 1.121ms;  
          minTime = 1.548ms;            
                              maxTime = 20.179ms;           maxTime = 16.974ms; 
          maxTime = 12.913ms;           
                              meanTime = 0.160546ms;        meanTime = 
1.349152ms;        meanTime = 1.707749ms;        
                              stdDevTime = 0.212426ms;      stdDevTime = 
0.490706ms;      stdDevTime = 0.341319ms;      
                              Speed = 6228.74 /sec          Speed = 741.20593 
/sec        Speed = 585.5659 /sec         
                              Rate = 74.74489 MB/s          Rate = 8.894472 
MB/s          Rate = 7.026791 MB/s       

org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure               
                                                                           
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 2.108722s;          sumTime = 
32.750794s;         sumTime = 16.406358s;         
                              minTime = 0.06ms;             minTime = 1.466ms;  
          minTime = 0.75ms;             
                              maxTime = 12.239ms;           maxTime = 
106.425ms;          maxTime = 19.271ms;           
                              meanTime = 0.105436ms;        meanTime = 
1.637539ms;        meanTime = 0.820317ms;        
                              stdDevTime = 0.159718ms;      stdDevTime = 
0.824038ms;      stdDevTime = 0.201979ms;      
                              Speed = 9484.417 /sec         Speed = 610.6722 
/sec         Speed = 1219.0396 /sec        
                              Rate = 113.81301 MB/s         Rate = 7.328067 
MB/s          Rate = 14.628475 MB/s         
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure               
                                                                           
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 1.57831s;           sumTime = 
9.910735s;          sumTime = 2.123169s;          
                              minTime = 0.072ms;            minTime = 0.445ms;  
          minTime = 0.098ms;            
                              maxTime = 2.888ms;            maxTime = 5.962ms;  
          maxTime = 5.348ms;            
                              meanTime = 0.078915ms;        meanTime = 
0.495536ms;        meanTime = 0.106158ms;        
                              stdDevTime = 0.034564ms;      stdDevTime = 
0.145759ms;      stdDevTime = 0.050933ms;      
                              Speed = 12671.781 /sec        Speed = 2018.0138 
/sec        Speed = 9419.881 /sec         
                              Rate = 152.06139 MB/s         Rate = 24.216167 
MB/s         Rate = 113.03858 MB/s   

org.apache.mahout.common.distance.TanimotoDistanceMeasure                       
                                                                   
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 5.2595s;            sumTime = 
25.087902s;         sumTime = 24.759318s;         
                              minTime = 0.246ms;            minTime = 1.149ms;  
          minTime = 1.118ms;            
                              maxTime = 7.092ms;            maxTime = 13.394ms; 
          maxTime = 7.952ms;            
                              meanTime = 0.262975ms;        meanTime = 
1.254395ms;        meanTime = 1.237965ms;        
                              stdDevTime = 0.073059ms;      stdDevTime = 
0.271826ms;      stdDevTime = 0.123637ms;      
                              Speed = 3802.6428 /sec        Speed = 797.19696 
/sec        Speed = 807.7767 /sec         
                              Rate = 45.631714 MB/s         Rate = 9.566364 
MB/s          Rate = 9.69332 MB/s  
org.apache.mahout.common.distance.TanimotoDistanceMeasure                       
                                                                   
                              nCalls = 20000;               nCalls = 20000;     
          nCalls = 20000;               
                              sumTime = 1.567412s;          sumTime = 
9.751068s;          sumTime = 2.101077s;          
                              minTime = 0.072ms;            minTime = 0.442ms;  
          minTime = 0.098ms;            
                              maxTime = 2.575ms;            maxTime = 8.815ms;  
          maxTime = 2.138ms;            
                              meanTime = 0.07837ms;         meanTime = 
0.487553ms;        meanTime = 0.105053ms;        
                              stdDevTime = 0.031039ms;      stdDevTime = 
0.171094ms;      stdDevTime = 0.028441ms;      
                              Speed = 12759.887 /sec        Speed = 2051.0574 
/sec        Speed = 9518.928 /sec         
                              Rate = 153.11865 MB/s         Rate = 24.61269 
MB/s          Rate = 114.227135 MB/s        

{noformat}
Result:


There is no difference in Manhattan distance, as its optimised for diff impls

> Solve performance issues with Vector Implementations
> ----------------------------------------------------
>
>                 Key: MAHOUT-300
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-300
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-300.patch, MAHOUT-300.patch
>
>
> AbstractVector operations like times
>   public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       int index = element.index();
>       result.setQuick(index, element.get() * x);
>     }
>     return result;
>   }
> should be implemented as follows
>  public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = result.iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       element.set(element.get() * x);
>     }
>     return result;
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to