GitHub user nzw0301 opened a pull request:
https://github.com/apache/incubator-hivemall/pull/116
[WIP][HIVEMALL-118] word2vec
## What changes were proposed in this pull request?
Add new algorithm: skip-gram with negative sampling (a.k.a word2vec)
## What type of PR is it?
Improvement
## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-118
## How was this patch tested?
manual tests on EMR
## How to use this feature?
please see `word2vec.md`
## Checklist
- [x] Did you apply source code formatter, i.e., `mvn formatter:format`,
for your commit?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nzw0301/incubator-hivemall skipgram
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hivemall/pull/116.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #116
----
commit f19186fe8eff3de0400cc318c8c876fc69dbc766
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-13T12:50:35Z
Init docs for word2vec
commit e9a76093efd1803dd63a11697abc68e7ae78cbb8
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-13T12:54:33Z
Init Alias Table builder
commit b6883b00e8f543130e16caa2ad9010cfcd0e6cd9
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-13T13:02:10Z
Fix typo
commit 3ca761956638c95264a37edfb49133f35c37cdd5
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-14T02:15:51Z
Separate calias table function
commit 2588e732b1194d800005c6ac28453adf79d89278
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-14T12:26:49Z
Create skip-gram UDTF
commit 33900380a337821f2da280d0c7c9c10f2fe14565
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-15T03:35:36Z
Use float to save memory
commit a7394a04bc699e105febf86a6b9712f8daa8ff92
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-15T07:04:44Z
Update query example in docs
commit 50d5ffcffff503f1522577b7789e7745650a1ece
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-15T07:05:21Z
Update forwarding
commit b701919825b3ed8909659b77236d6522f6d51be6
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-18T12:03:55Z
Init Word2vecFeatureUDTF
commit 7f69ef66ef2849e57d06601fb9098b6bf2c2bb21
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-19T08:53:26Z
Implement skip-gramfeature UDTF
commit 48c1929b6345d144059eec44636c0194eebc5281
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-19T14:01:46Z
Update skipgram
commit 7f4abde3760cf17455d99eb6c5a7cbe8c3dc5e3c
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-19T16:05:15Z
Refactor Skipgram
commit 2e429d60ff2ae473a64cb890e67c9d8b9a7b2830
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-20T05:26:25Z
Update for query change
commit 7014e8552f81d59ab35c95d8fcf54c56c24ba2c9
Author: Kento NOZAWA <[email protected]>
Date: 2017-09-20T08:54:51Z
Remove discard table
----
---