GitHub user takuti opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/84

    [WIP][HIVEMALL-19] Support DIMSUM for approx. all-pairs similarity

    ## What changes were proposed in this pull request?
    
    Support DIMSUM, Dimension Independent Matrix Square using MapReduce, for 
approximated all-pairs similarity computation. It makes item-based CF more 
efficient.
    
    https://stanford.edu/~rezab/papers/dimsum.pdf
    
    ## What type of PR is it?
    
    Feature
    
    ## What is the Jira issue?
    
    - https://issues.apache.org/jira/browse/HIVEMALL-19
    
    ## How was this patch tested?
    
    - Unit tests
    - Manual tests on EMR
    
    ---
    
    ### TODO
    
    - [ ] Documentation
    - [ ] Evaluate on larger data e.g. MovieLens

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/takuti/incubator-hivemall DIMSUM

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/84.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #84
    
----
commit 1a661cef229a508655352c360a2890bd66da1ab0
Author: Takuya Kitazawa <[email protected]>
Date:   2017-06-01T03:30:08Z

    Add `l2_norm` UDAF

commit c19abc5b8e603b65595346c6fb76329a09a1e02c
Author: Takuya Kitazawa <[email protected]>
Date:   2017-06-01T09:10:16Z

    Implement DIMSUM mapper

commit 44367b29056752b32bbbd9601e9500fa6398e8ef
Author: Takuya Kitazawa <[email protected]>
Date:   2017-06-02T01:58:40Z

    Make symmetric output (j, k), (k, j) configureable

commit a6e854c856ce3deef46e6b8b0293497d57e82901
Author: Takuya Kitazawa <[email protected]>
Date:   2017-06-02T03:16:23Z

    Support string feature

commit 97cb91d8fef0cd2f85657a02bd9a2505d7551337
Author: Takuya Kitazawa <[email protected]>
Date:   2017-06-02T03:28:22Z

    Fix so that default `gamma` is computed correctly

commit b42b65b1cb358a89cd402f90b5ec3d6c79ff465c
Author: Takuya Kitazawa <[email protected]>
Date:   2017-06-02T07:04:25Z

    Add unit test for DIMSUMMapperUDTF

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to