[jira] [Updated] (MAHOUT-1780) Multi-threaded Matrix Multiplication is slower than Single-thread variant

Suneel Marthi (JIRA) Sun, 25 Oct 2015 04:38:22 -0700

     [ 
https://issues.apache.org/jira/browse/MAHOUT-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Suneel Marthi updated MAHOUT-1780:
----------------------------------
    Description: 
Capturing here the Conversation on this subject:

{code}

Turns out that matrix view traversal (of dense matrices, anyway) is 4 times 
slower than regular matrix traversal in the same direction. I.e.

Ad %*% Bd: (106.33333333333333,85.0)
Ad(r,::) %*% Bd: (356.0,328.0)

where r=0 until Ad.nrow.

Investigated MatrixView, it reports correct matrix flavor (as the owner's) and 
correct algorithm is selected (the same as for the row above). Sure, MatrixView 
gives an indirection(sometimes even double indirection) but 4x?? It should not 
be that much different from transpose view overhead, and transpose view 
overhead is very small in the tests (compared to the rest of the cost)

The main difference seems to be that the algorithm over matrices ends up doing 
a dot over DenseVector and a DenseVector (even that the wrapper object is 
created inside the row iterations) whereas the inefficient algorithm does the 
same over VectorView wrappers. I wonder if VectorView has not been equipped to 
pass on the flavors of its backing vector to the vector-vector optimization.

Apparently the dot algorithm on vector view goes to the in-core vector-vector 
optimization framework (calls aggregate()) but denseVector applies custom 
iteration. Hence it may boil down to experiments of avec dot bvec vs. avec(::) 
dot bvec(::). 

{code}

  was:
Capturing here the Conversation on this subject:

{quote}

Turns out that matrix view traversal (of dense matrices, anyway) is 4 times 
slower than regular matrix traversal in the same direction. I.e.

Ad %*% Bd: (106.33333333333333,85.0)
Ad(r,::) %*% Bd: (356.0,328.0)

where r=0 until Ad.nrow.

Investigated MatrixView, it reports correct matrix flavor (as the owner's) and 
correct algorithm is selected (the same as for the row above). Sure, MatrixView 
gives an indirection(sometimes even double indirection) but 4x?? It should not 
be that much different from transpose view overhead, and transpose view 
overhead is very small in the tests (compared to the rest of the cost)

The main difference seems to be that the algorithm over matrices ends up doing 
a dot over DenseVector and a DenseVector (even that the wrapper object is 
created inside the row iterations) whereas the inefficient algorithm does the 
same over VectorView wrappers. I wonder if VectorView has not been equipped to 
pass on the flavors of its backing vector to the vector-vector optimization.

Apparently the dot algorithm on vector view goes to the in-core vector-vector 
optimization framework (calls aggregate()) but denseVector applies custom 
iteration. Hence it may boil down to experiments of avec dot bvec vs. avec(::) 
dot bvec(::). 

{quote}


> Multi-threaded Matrix Multiplication is slower than Single-thread variant
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-1780
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1780
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.10.0, 0.10.1, 0.10.2, 0.11.0
>            Reporter: Suneel Marthi
>            Assignee: Dmitriy Lyubimov
>            Priority: Critical
>             Fix For: 0.12.0, 0.13.0
>
>
> Capturing here the Conversation on this subject:
> {code}
> Turns out that matrix view traversal (of dense matrices, anyway) is 4 times 
> slower than regular matrix traversal in the same direction. I.e.
> Ad %*% Bd: (106.33333333333333,85.0)
> Ad(r,::) %*% Bd: (356.0,328.0)
> where r=0 until Ad.nrow.
> Investigated MatrixView, it reports correct matrix flavor (as the owner's) 
> and correct algorithm is selected (the same as for the row above). Sure, 
> MatrixView gives an indirection(sometimes even double indirection) but 4x?? 
> It should not be that much different from transpose view overhead, and 
> transpose view overhead is very small in the tests (compared to the rest of 
> the cost)
> The main difference seems to be that the algorithm over matrices ends up 
> doing a dot over DenseVector and a DenseVector (even that the wrapper object 
> is created inside the row iterations) whereas the inefficient algorithm does 
> the same over VectorView wrappers. I wonder if VectorView has not been 
> equipped to pass on the flavors of its backing vector to the vector-vector 
> optimization.
> Apparently the dot algorithm on vector view goes to the in-core vector-vector 
> optimization framework (calls aggregate()) but denseVector applies custom 
> iteration. Hence it may boil down to experiments of avec dot bvec vs. 
> avec(::) dot bvec(::). 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAHOUT-1780) Multi-threaded Matrix Multiplication is slower than Single-thread variant

Reply via email to