GitHub user jaxony opened a pull request: https://github.com/apache/incubator-hivemall/pull/167
[HIVEMALL-220] Implement Cofactor ## What changes were proposed in this pull request? Implemented new matrix factorization algorithm for the recommendation problem. ## What type of PR is it? Feature ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-220 ## How was this patch tested? Unit tests and manual testing on ML20M in a Hive dev environment ## How to use this feature? TODO ## Checklist (Please remove this section if not needed; check `x` for YES, blank for NO) - [ ] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit? - [ ] Did you run system tests on Hive (or Spark)? You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaxony/incubator-hivemall feature/cofactor-feature-array Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/167.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #167 ---- commit 118cdb531c1d809a889b51fc8fd8e3a471ab4352 Author: Jackson Huang <huang.j@...> Date: 2018-09-20T05:59:09Z feat: implementing CofactorModel as subclass of FactorizedModel commit 58ab28cee1dd17b2ae50e5aca866d6d5a3e6bbf0 Author: Jackson Huang <huang.j@...> Date: 2018-09-20T06:08:20Z feat: implemented getter and setter for contextBias commit 8430caefc79b0667dc32968877beed62a11b3ab7 Author: Jackson Huang <huang.j@...> Date: 2018-09-20T08:17:07Z feat: added co-occurrence matrix accumulation commit c06378a8d22792c13f23e0a08073f5ee1f813172 Author: Jackson Huang <huang.j@...> Date: 2018-09-20T08:57:17Z CofactorModel: add hyperparameters c0 and c1 commit 41a3f001d819ac588c8f1e8a3034696384fc57e8 Author: Jackson Huang <huang.j@...> Date: 2018-09-20T08:58:07Z CofactorModel: c hyperparameters are final commit d1eee785cd67e797d46de6d11f5961b92413e680 Author: Jackson Huang <huang.j@...> Date: 2018-09-20T09:20:13Z CofactorModel: Change c0 and c1 to float commit 800d7ca80f91e1f3d10ce9c68bb4df2d80ddbde6 Author: Jackson Huang <huang.j@...> Date: 2018-09-20T09:20:38Z WIP: Implementing cofactor UDTF commit ee76755ffab07ad004a9b2768cc01f3d52db5ad2 Author: Jackson Huang <huang.j@...> Date: 2018-09-20T09:27:52Z CofactorizationUDTF: rename scaling parameters commit 43f15f97d91e1608baa114ce3d7aa9aa62ab0e2d Author: Jackson Huang <huang.j@...> Date: 2018-09-20T09:28:52Z CofactorizationUDTF: Implement option parsing for cofactorization options commit 4a81f0733a8bc03461e1e8db32b571d57b8aa9ef Author: Jackson Huang <huang.j@...> Date: 2018-09-26T02:52:35Z make Cofactor standalone class: copied code from FactorizedModel commit 90f89bbf1869960a59206d942b0fc73e4d678714 Author: Jackson Huang <huang.j@...> Date: 2018-09-26T02:56:19Z Remove user bias because cofactor paper does not use it commit 6e1539bd225e38d36fe1b46bbd0eab24691dd59d Author: Jackson Huang <huang.j@...> Date: 2018-09-26T03:26:09Z Added numItems to getOptions commit 8e12e28240f6e967cc39ff5df1dec0b48804e3cd Author: Jackson Huang <huang.j@...> Date: 2018-09-26T03:26:42Z Implementing RatingInitializer commit e04ccc1a185bffaa4f5d5834575290ff18a8a52b Author: Jackson Huang <huang.j@...> Date: 2018-09-26T03:27:06Z Added batch training class commit 747c90c79827d65e23200bb7adb12c4e89d52b9e Author: Jackson Huang <huang.j@...> Date: 2018-09-26T03:27:26Z Copied and pasted from OnlineMatrixFactorizationUDTF commit d482f2c101628bffd2e3fc30ca1035a06b1e80c1 Author: Jackson Huang <huang.j@...> Date: 2018-09-26T05:29:48Z Implemented part of process() commit 51cb2f05ed637a48167ce2db4b7b23fe18ba19a3 Author: Jackson Huang <huang.j@...> Date: 2018-09-26T06:00:42Z WIP: implementing process commit e9e8a31b384a45638fee1ad0b41eefa989ea18ed Author: Jackson Huang <huang.j@...> Date: 2018-09-26T08:13:01Z Removing zero features in input Feature[], Added test for nnz array creation commit 73b68e29c9eda5c636f68f2e930ac56dd617c707 Author: Jackson Huang <huang.j@...> Date: 2018-09-26T08:17:34Z Assert non-zero entries in Features when updating cooccurrence matrix commit ef28ed82404b3ce14d7f76907825de7f6515adfe Author: Jackson Huang <huang.j@...> Date: 2018-09-26T08:50:53Z fix: use feature index instead of index of modified array for co-occurrence updates commit 34063ccebc52e8b4374530227c679f28e3c444a0 Author: Jackson Huang <jackson_huang@...> Date: 2018-09-27T06:34:08Z Removed SPPMI matrix as it will be supplied by the user commit 65f9e72f3018992a2fb215bcacdb91acd8bf4296 Author: Jackson Huang <jackson_huang@...> Date: 2018-09-27T07:13:06Z Rename weight variable names to be the same as cofacto.py commit 5b27101b0d41263a3f422f99cba3f2f2f97f3958 Author: Jackson Huang <jackson_huang@...> Date: 2018-10-11T07:00:28Z Change Feature#parseFeature method to public commit 2bf64d02e8adc2005f9fa3280cca427279e170e4 Author: Jackson Huang <jackson_huang@...> Date: 2018-10-11T07:01:28Z WIP: changed input argument format to process(...) commit b6c7261ebc7fedcec5681aaa13457ca2a0a8fe77 Author: Jackson Huang <jackson_huang@...> Date: 2018-10-11T09:25:00Z Refactor: less code duplication commit 3c522ce1824c77bcba92e032d0f60ba91f7fd11b Author: Jackson Huang <jackson_huang@...> Date: 2018-10-11T09:29:23Z Better implementation of minibatch data structure, updated writing data to buffer commit 985ea29f61c115d1b500fa2f4b6bea3a9af41f86 Author: Jackson Huang <jackson_huang@...> Date: 2018-10-11T09:29:40Z Remove RatingInitializer interface commit 2aeeb5250dcebd942cce081eb34475fb10bc8569 Author: Jackson Huang <jackson_huang@...> Date: 2018-10-11T09:35:47Z Replace setBetaBias with flexible implementation commit f174d2264be6d2ed479c53a5c2336faa1727fa89 Author: Jackson Huang <jackson_huang@...> Date: 2018-10-11T09:35:58Z Reformatting commit b40f48f6e27dd8ab6c01c0545d9405f301e76393 Author: Jackson Huang <jackson_huang@...> Date: 2018-10-11T09:37:02Z More reformatting ---- ---