[
https://issues.apache.org/jira/browse/MAHOUT-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189151#comment-16189151
]
ASF GitHub Bot commented on MAHOUT-2019:
----------------------------------------
GitHub user pferrel opened a pull request:
https://github.com/apache/mahout/pull/342
MAHOUT-2019 Sparse speedup
### Purpose of PR:
to review an apparent speedup of spark-itemsimilarity and the underlying
SimilarityAnalysis.cooccurrence by using an iterateNonZero instead of the
previous for loops in SparseRowMatrix.
For discussion only at present
MAHOUT-2019
https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2019?filter=allopenissues&orderby=priority+DESC%2C+updated+DESC
### Important ToDos
Please mark each with an "x"
- [x] A JIRA ticket exists (if not, please create this
first)[https://issues.apache.org/jira/browse/ZEPPELIN/]
- [x] Title of PR is "MAHOUT-XXXX Brief Description of Changes" where XXXX
is the JIRA number.
- [ ] Created unit tests where appropriate
- [ ] Added licenses correct on newly added files
- [ ] Assigned JIRA to self
- [ ] Added documentation in scala docs/java docs, and to website
- [ ] Successfully built and ran all unit tests, verified that all tests
pass locally.
If all of these things aren't complete, but you still feel it is
appropriate to open a PR, please add [WIP] after MAHOUT-XXXX before the
descriptions- e.g. "MAHOUT-XXXX [WIP] Description of Change"
Does this change break earlier versions?
Is this the beginning of a larger project for which a feature branch should
be made?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pferrel/mahout sparse-speedup
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/mahout/pull/342.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #342
----
commit 26a2efa65e9f09df358e1021ebf45e3735e2ec6c
Author: pferrel <[email protected]>
Date: 2017-10-02T18:39:54Z
minimum speedup fix
commit 9330a2ed6d1211459c57863a5d664377c55aa747
Author: pferrel <[email protected]>
Date: 2017-10-02T19:27:47Z
minimum speedup fix with cast exception check
commit 722bd11f01e7250f99f21f17ec7211bf5abb2089
Author: pferrel <[email protected]>
Date: 2017-10-02T20:33:07Z
added cast exception logging to SparseRowMatrix
commit 02700ef13c44e403cba58288dcbab5cfabed8585
Author: pferrel <[email protected]>
Date: 2017-10-02T20:35:14Z
Merge branch 'master' into sparse-speedup
----
> SparseRowMatrix assign ops user for loops instead of iterateNonZero and so
> can be optimized
> -------------------------------------------------------------------------------------------
>
> Key: MAHOUT-2019
> URL: https://issues.apache.org/jira/browse/MAHOUT-2019
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Affects Versions: 0.13.0
> Reporter: Pat Ferrel
> Assignee: Pat Ferrel
> Fix For: 0.13.1
>
>
> DRMs get blockified into SparseRowMatrix instances if the density is low. But
> SRM inherits the implementation of method like "assign" from AbstractMatrix,
> which uses nest for loops to traverse rows. For multiplying 2 matrices that
> are extremely sparse, the kind if data you see in collaborative filtering,
> this is extremely wasteful of execution time. Better to use a sparse vector's
> iterateNonZero Iterator for some function types.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)