GitHub user takuti opened a pull request:
https://github.com/apache/incubator-hivemall/pull/84
[WIP][HIVEMALL-19] Support DIMSUM for approx. all-pairs similarity
## What changes were proposed in this pull request?
Support DIMSUM, Dimension Independent Matrix Square using MapReduce, for
approximated all-pairs similarity computation. It makes item-based CF more
efficient.
https://stanford.edu/~rezab/papers/dimsum.pdf
## What type of PR is it?
Feature
## What is the Jira issue?
- https://issues.apache.org/jira/browse/HIVEMALL-19
## How was this patch tested?
- Unit tests
- Manual tests on EMR
---
### TODO
- [ ] Documentation
- [ ] Evaluate on larger data e.g. MovieLens
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/takuti/incubator-hivemall DIMSUM
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hivemall/pull/84.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #84
----
commit 1a661cef229a508655352c360a2890bd66da1ab0
Author: Takuya Kitazawa <[email protected]>
Date: 2017-06-01T03:30:08Z
Add `l2_norm` UDAF
commit c19abc5b8e603b65595346c6fb76329a09a1e02c
Author: Takuya Kitazawa <[email protected]>
Date: 2017-06-01T09:10:16Z
Implement DIMSUM mapper
commit 44367b29056752b32bbbd9601e9500fa6398e8ef
Author: Takuya Kitazawa <[email protected]>
Date: 2017-06-02T01:58:40Z
Make symmetric output (j, k), (k, j) configureable
commit a6e854c856ce3deef46e6b8b0293497d57e82901
Author: Takuya Kitazawa <[email protected]>
Date: 2017-06-02T03:16:23Z
Support string feature
commit 97cb91d8fef0cd2f85657a02bd9a2505d7551337
Author: Takuya Kitazawa <[email protected]>
Date: 2017-06-02T03:28:22Z
Fix so that default `gamma` is computed correctly
commit b42b65b1cb358a89cd402f90b5ec3d6c79ff465c
Author: Takuya Kitazawa <[email protected]>
Date: 2017-06-02T07:04:25Z
Add unit test for DIMSUMMapperUDTF
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---