[ 
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577181#action_12577181
 ] 

Jason Rennie commented on MAHOUT-6:
-----------------------------------

David's right, I'm wrong---a primitive implementation didn't help (wrt speed).  
I tested the pcj IntKeyDoubleOpenHashMap vs. java.util.HashMap<Integer,Double>. 
 Tried two load factors (I set initial capacity to 1/loadFactor): 25% and 50%.  
The two maps were about comparable for 25%.  java.util.HashMap was faster for 
50%.  The CRS impl. was 2.3-3.6x faster than java.util.HashMap (after 
subtracting the SW "base" times).

The details:

        /**
         * <ul>
         * <li> Finished HashMap dot product in 6.501 seconds
         * <li> Finished PrimitiveMap dot product in 6.589 seconds
         * <li> Finished CRS dot product in 2.632 seconds
         * <li> Finished SW base time in 1.162 seconds
         * <li> numTrials=1000000 vectorSize=1000 nnz1=100 nnz2=100 
loadFactor=25%
         * </ul>
         * <ul>
         * <li> Finished HashMap dot product in 4.573 seconds
         * <li> Finished PrimitiveMap dot product in 4.033 seconds
         * <li> Finished CRS dot product in 2.290 seconds
         * <li> Finished SW base time in 1.103 seconds
         * <li> numTrials=1000000 vectorSize=1000 nnz1=200 nnz2=50 
loadFactor=25%
         * </ul>
         * <ul>
         * <li> Finished HashMap dot product in 6.244 seconds
         * <li> Finished PrimitiveMap dot product in 7.417 seconds
         * <li> Finished CRS dot product in 2.575 seconds
         * <li> Finished SW base time in 1.054 seconds
         * <li> numTrials=1000000 vectorSize=1000 nnz1=100 nnz2=100 
loadFactor=50%
         * </ul>
         * <ul>
         * <li> Finished HashMap dot product in 3.745 seconds
         * <li> Finished PrimitiveMap dot product in 4.249 seconds
         * <li> Finished CRS dot product in 2.270 seconds
         * <li> Finished SW base time in 1.164 seconds
         * <li> numTrials=1000000 vectorSize=1000 nnz1=200 nnz2=50 
loadFactor=50%
         * </ul>
         */
        public void testSparseVectorPerformance() throws Exception {
                StopWatch hmvSW = new StopWatch("HashMap dot product", log, 
false);
                StopWatch pmvSW = new StopWatch("PrimitiveMap dot product", 
log, false);
                StopWatch crsSW = new StopWatch("CRS dot product", log, false);
                StopWatch baseSW = new StopWatch("SW base time", log, false);
                final int numTrials = 1000000;
                final int vectorSize = 1000;
                final int nnz1 = 100;
                final int nnz2 = 100;
                final int sizeMultiple = 2;
                for (int i = 0; i < numTrials; ++i) {
                        // ignore first 10% of iterations
                        if (i == numTrials / 10) {
                                hmvSW.reset();
                                pmvSW.reset();
                                crsSW.reset();
                                baseSW.reset();
                        }
                        SparseVectorPrimitiveMap pmv1 = new 
SparseVectorPrimitiveMap(nnz1 * sizeMultiple);
                        SparseVectorPrimitiveMap pmv2 = new 
SparseVectorPrimitiveMap(nnz2 * sizeMultiple);
                        SparseVectorHashMap hmv1 = new SparseVectorHashMap(nnz1 
* sizeMultiple);
                        SparseVectorHashMap hmv2 = new SparseVectorHashMap(nnz2 
* sizeMultiple);
                        for (int j = 0; j < nnz1; ++j) {
                                int index = this.rand.nextInt(vectorSize) + 1;
                                double value = this.rand.nextDouble();
                                pmv1.set(index, value);
                                hmv1.set(index, value);
                        }
                        for (int j = 0; j < nnz2; ++j) {
                                int index = this.rand.nextInt(vectorSize) + 1;
                                double value = this.rand.nextDouble();
                                pmv2.set(index, value);
                                hmv2.set(index, value);
                        }
                        SparseVector crsv1 = pmv1.buildSparseVector();
                        SparseVector crsv2 = pmv2.buildSparseVector();
                        hmvSW.start();
                        hmv1.dot(hmv2);
                        hmvSW.stop();
                        pmvSW.start();
                        pmv1.dot(pmv2);
                        pmvSW.stop();
                        crsSW.start();
                        crsv1.dot(crsv2);
                        crsSW.stop();
                        baseSW.start();
                        baseSW.stop();
                }
                hmvSW.logEndMessage();
                pmvSW.logEndMessage();
                crsSW.logEndMessage();
                baseSW.logEndMessage();
                log.debug("numTrials=" + numTrials + " vectorSize=" + 
vectorSize + " nnz1=" + nnz1 + " nnz2=" + nnz2 + " loadFactor=" + (100 / 
sizeMultiple) + "%");
        }


> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>            Assignee: Grant Ingersoll
>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff, 
> MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff, 
> MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff, 
> MAHOUT-6l.patch
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different 
> reducers
> g) a reasonable set of matrix operations should be supported, these should 
> eventually include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra 
> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + 
> beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to