GitHub user takuti opened a pull request:
https://github.com/apache/incubator-hivemall/pull/118
[HIVEMALL-146] Yet another UDF to generate n-grams
## What changes were proposed in this pull request?
Add a new UDF `to_ngrams(array<string> words, int minSize, int maxSize)`
which returns list of n-grams `minSize <= n <= maxSize` for given words. This
UDF can be alternative of the original Hive `ngrams` function.
## What type of PR is it?
Feature
## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-146
## How was this patch tested?
Unit test, manual tests both on EMR and local Hive
## How to use this feature?
as documented
## Checklist
(Please remove this section if not needed; check `x` for YES, blank for NO)
- [x] Did you apply source code formatter, i.e., `mvn formatter:format`,
for your commit?
- [x] Did you run system tests on Hive (or Spark)?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/takuti/incubator-hivemall HIVEMALL-146-ngrams
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hivemall/pull/118.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #118
----
commit 6e9d08f264c173410e4a90fa4533db0dd28836ca
Author: Takuya Kitazawa <[email protected]>
Date: 2017-10-02T05:38:31Z
Implement `to_ngrams` UDF
commit df81ee2de13666636068c1691f595b775e39a6f5
Author: Takuya Kitazawa <[email protected]>
Date: 2017-10-02T05:46:35Z
Update document
----
---