[jira] [Commented] (MAHOUT-1508) SparseMatrix assign has huge performance problems

Dmitriy Lyubimov (JIRA) Fri, 11 Apr 2014 08:38:45 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966657#comment-13966657
 ]


Dmitriy Lyubimov commented on MAHOUT-1508:
------------------------------------------

[~ssc] i am not so sure. SRM is fastest to construct from rows and fastest to 
iterate over rows (since it doesn't create vector views, and vector elements in 
this case are co-located in memory. 

if x cross y doesn't have efficient representation in SRM, then maybe it 
shouldn't use SRM. 

The real problem is that we don't have in-core algebraic matrix optimizer here 
that would look at the expression as a whole. I am not sure we would need one, 
but if we had, we'd be able to compute cost variations not only of the current 
operation, but of subsequent, too. But since we are constrainted by the cost of 
current operation only, we do what's fastest for the current operation 
(one-step greedy approach). so SRM is the fastest to wrap a bunch of rows, and 
so that's what blockify() uses. If subsequent operation is custom row-wise 
iterator, this choice is still the best. but for others, it may not -- but we 
have no way of knowing at the point we construct SRM. 

More likely, we might do what we did with aggregate() and dot() calls on 
vectors, i.e. make it operator-centric approach where operator chooses approach 
based on underlying matrix argument structures. in that sense we could 
introduce some organizational concept the same way we did for vectors. E.g. we 
could have methods isRowWiseLike, isColumnWiseLike, isSparseHashLike etc. etc. 
which would allow us to asses costs for certain iteration patterns. 

this would strike a middle ground between full-fledged optimizer and 
doing-nothing.

> SparseMatrix assign has huge performance problems
> -------------------------------------------------
>
>                 Key: MAHOUT-1508
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1508
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Sebastian Schelter
>            Priority: Blocker
>             Fix For: 1.0
>
>
> I'm currently working with the new Scala DSL and running into a problem with 
> SparseMatrix and SparseRowMatrix.
> The problem is that they don't implement a specialized assign(Matrix other, 
> DoubleDoubleFunction f) function, but use the implementation from 
> AbstractMatrix which loops through all (!) possible entries. 
> We have to fix this asap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1508) SparseMatrix assign has huge performance problems

Reply via email to