GitHub user takuti opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/108

    [HIVEMALL-138] `to_ordered_map` UDAF with size limit

    ## What changes were proposed in this pull request?
    
    Implement `to_bounded_ordered_map` UDAF. The UDAF is an extended version of 
`to_ordered_map` which has limit of map size.
    
    `to_bounded_ordered_map` UDAF can be used as an alternative of `each_top_k` 
UDTF. The main difference is that the former actively utilizes mapper-side 
aggregation.
    
    ## What type of PR is it?
    
    Feature
    
    ## What is the Jira issue?
    
    https://issues.apache.org/jira/browse/HIVEMALL-138
    
    ## How was this patch tested?
    
    Manual test on local and EMR
    
    ## How to use this feature?
    
    ```
    to_bounded_ordered_map(key, value, size [, const boolean 
reverseOrder=false])  
    ```
    
    ```sql
    with t as (
        select 10 as key, 'apple' as value
        union all
        select 3 as key, 'banana' as value
        union all
        select 4 as key, 'candy' as value
    )
    select
        to_bounded_ordered_map(key, value, 1),
        to_bounded_ordered_map(key, value, 2),
        to_bounded_ordered_map(key, value, 3),
        to_bounded_ordered_map(key, value, 100),
        to_bounded_ordered_map(key, value, 1, true),
        to_bounded_ordered_map(key, value, 2, true),
        to_bounded_ordered_map(key, value, 3, true),
        to_bounded_ordered_map(key, value, 100, true)
    from t
    ;
    ```
    
    > {3:"banana"}    {3:"banana",4:"candy"}  {3:"banana",4:"candy",10:"apple"} 
      {3:"banana",4:"candy",10:"apple"}       {10:"apple"}    
{10:"apple",4:"candy"}  {10:"apple",4:"candy",3:"banana"}    
{10:"apple",4:"candy",3:"banana"}
    
    ## Checklist
    
    - [x] Did you apply source code formatter, i.e., `mvn formatter:format`, 
for your commit?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/takuti/incubator-hivemall topk-ordered-map

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/108.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #108
    
----
commit 78403e8a3cb99b6bccdf2500ad5551d413345222
Author: Takuya Kitazawa <k.tak...@gmail.com>
Date:   2017-08-07T05:26:13Z

    Fix typo

commit 46a23a2129ea74244e8a42b6aa5d9da9d5cf8ba1
Author: Takuya Kitazawa <k.tak...@gmail.com>
Date:   2017-08-07T07:14:42Z

    Implement `to_bounded_ordered_map` UDAF

commit 3c029f9bd71adb70db8dfc48f6452362dacc164c
Author: Takuya Kitazawa <k.tak...@gmail.com>
Date:   2017-08-07T07:24:15Z

    Throw an exception for invalid map size

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to