[ 
https://issues.apache.org/jira/browse/MAHOUT-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020692#comment-14020692
 ] 

Dmitriy Lyubimov edited comment on MAHOUT-1574 at 6/7/14 4:48 AM:
------------------------------------------------------------------

I suggest to reopen: 

(1) first, an added test is failing in jenkins build. Not sure whether 
legitimately or not.
(2) 

{code}
     if (other instanceof SparseRowMatrix) {
     .... 
{code}

is a good first step but still a bit naive. Not sure it solves Sebastian's 
case. While SRM %\*% SRM is definitely one of the cases, there's also a handful 
of other cases (e.g. SRM %\*% SCM.t which is organizationally equivalent).

What i meant previously by absence of cost based approach for in-core matrices 
was more along the lines what Robin did the vector optimizations. Instead of 
asking "Are you imiplementation A" vector operations carry a cost-interrogating 
abstraction asking questions like "what is my cost traversing non-zeros? what 
is my cost for random access?" 

Also in current vector optimizations, this process is not a property (method) 
of LHS, but rather interrogates both LHS and RHS. Similarly i guess matrices 
could be interrogated along the lines "what is cost of row wise non-zero 
traverse", "what is the cost column-wise traverse", "what traversal plan bears 
best element locality" etc. etc.



was (Author: dlyubimov):
I suggest to reopen: 

(1) first, an added test is failing in jenkins build. Not sure whether 
legitimately or not.
(2) 

{code}
     if (other instanceof SparseRowMatrix) {
     .... 
{code}

is a good first step but still a bit naive. Not sure it solves Sebastian's 
case. While SRM %*% SRM is definitely one of the cases, there's also a handful 
of other cases (e.g. SRM %*% SCM.t which is organizationally equivalent).

What i meant previously by absence of cost based approach for in-core matrices 
was more along the lines what Robin did the vector optimizations. Instead of 
asking "Are you imiplementation A" vector operations carry a cost-interrogating 
abstraction asking questions like "what is my cost traversing non-zeros? what 
is my cost for random access?" 

Also in current vector optimizations, this process is not a property (method) 
of LHS, but rather interrogates both LHS and RHS. Similarly i guess matrices 
could be interrogated along the lines "what is cost of row wise non-zero 
traverse", "what is the cost column-wise traverse", "what traversal plan bears 
best element locality" etc. etc.


> SparseRowMatrix needs performance improvement for times()
> ---------------------------------------------------------
>
>                 Key: MAHOUT-1574
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1574
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Ted Dunning
>
> According to ssc,
> > * SparseRowMatrix with sequential vectors times SparseRowMatrix with
> > sequential vectors is totally broken, it uses three nested loops and uses
> > get(row, col) on the matrices, which internally uses binary search...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to