[jira] [Resolved] (MAHOUT-1780) Multi-threaded Matrix Multiplication is slower than Single-thread variant

Suneel Marthi (JIRA) Sun, 25 Oct 2015 13:37:47 -0700

     [ 
https://issues.apache.org/jira/browse/MAHOUT-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Suneel Marthi resolved MAHOUT-1780.
-----------------------------------
    Resolution: Duplicate

Duplicate of M-1781

> Multi-threaded Matrix Multiplication is slower than Single-thread variant
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-1780
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1780
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.10.0, 0.10.1, 0.10.2, 0.11.0
>            Reporter: Suneel Marthi
>            Assignee: Dmitriy Lyubimov
>            Priority: Critical
>              Labels: performance
>             Fix For: 0.12.0, 0.13.0
>
>
> Capturing the Conversation on the subject here:
> {code}
> Turns out that matrix view traversal (of dense matrices, anyway) is 4 times 
> slower than regular matrix traversal in the same direction. I.e.
> Ad %*% Bd: (106.33333333333333,85.0)
> Ad(r,::) %*% Bd: (356.0,328.0)
> where r=0 until Ad.nrow.
> On investigating MatrixView, it does report correct matrix flavor (as the 
> owner's) and correct algorithm is selected (the same as for the row above). 
> MatrixView gives an indirection(sometimes even double indirection) but it 
> still doesn't explain the 4x performance degrade. It should not be that much 
> different from transpose view overhead, and transpose view overhead is very 
> small in the tests (compared to the rest of the cost)
> The main difference seems to be that the algorithm over matrices ends up 
> doing a dot over DenseVector and a DenseVector (even that the wrapper object 
> is created inside the row iterations) whereas the inefficient algorithm does 
> the same over VectorView wrappers. I wonder if VectorView has not been 
> equipped to pass on the flavors of its backing vector to the vector-vector 
> optimization.
> Apparently the dot algorithm on vector view goes to the in-core vector-vector 
> optimization framework (calls aggregate()) but denseVector applies custom 
> iteration. Hence it may boil down to experiments of avec dot bvec vs. 
> avec(::) dot bvec(::). 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (MAHOUT-1780) Multi-threaded Matrix Multiplication is slower than Single-thread variant

Reply via email to