GitHub user takuti opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/118

    [HIVEMALL-146] Yet another UDF to generate n-grams

    ## What changes were proposed in this pull request?
    
    Add a new UDF `to_ngrams(array<string> words, int minSize, int maxSize)` 
which returns list of n-grams `minSize <= n <= maxSize` for given words. This 
UDF can be alternative of the original Hive `ngrams` function.
    
    ## What type of PR is it?
    
    Feature
    
    ## What is the Jira issue?
    
    https://issues.apache.org/jira/browse/HIVEMALL-146
    
    ## How was this patch tested?
    
    Unit test, manual tests both on EMR and local Hive
    
    ## How to use this feature?
    
    as documented
    
    ## Checklist
    
    (Please remove this section if not needed; check `x` for YES, blank for NO)
    
    - [x] Did you apply source code formatter, i.e., `mvn formatter:format`, 
for your commit?
    - [x] Did you run system tests on Hive (or Spark)?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/takuti/incubator-hivemall HIVEMALL-146-ngrams

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #118
    
----
commit 6e9d08f264c173410e4a90fa4533db0dd28836ca
Author: Takuya Kitazawa <[email protected]>
Date:   2017-10-02T05:38:31Z

    Implement `to_ngrams` UDF

commit df81ee2de13666636068c1691f595b775e39a6f5
Author: Takuya Kitazawa <[email protected]>
Date:   2017-10-02T05:46:35Z

    Update document

----


---

Reply via email to