GitHub user takuti opened a pull request:
https://github.com/apache/incubator-hivemall/pull/108
[HIVEMALL-138] `to_ordered_map` UDAF with size limit
## What changes were proposed in this pull request?
Implement `to_bounded_ordered_map` UDAF. The UDAF is an extended version of
`to_ordered_map` which has limit of map size.
`to_bounded_ordered_map` UDAF can be used as an alternative of `each_top_k`
UDTF. The main difference is that the former actively utilizes mapper-side
aggregation.
## What type of PR is it?
Feature
## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-138
## How was this patch tested?
Manual test on local and EMR
## How to use this feature?
```
to_bounded_ordered_map(key, value, size [, const boolean
reverseOrder=false])
```
```sql
with t as (
select 10 as key, 'apple' as value
union all
select 3 as key, 'banana' as value
union all
select 4 as key, 'candy' as value
)
select
to_bounded_ordered_map(key, value, 1),
to_bounded_ordered_map(key, value, 2),
to_bounded_ordered_map(key, value, 3),
to_bounded_ordered_map(key, value, 100),
to_bounded_ordered_map(key, value, 1, true),
to_bounded_ordered_map(key, value, 2, true),
to_bounded_ordered_map(key, value, 3, true),
to_bounded_ordered_map(key, value, 100, true)
from t
;
```
> {3:"banana"} {3:"banana",4:"candy"} {3:"banana",4:"candy",10:"apple"}
{3:"banana",4:"candy",10:"apple"} {10:"apple"}
{10:"apple",4:"candy"} {10:"apple",4:"candy",3:"banana"}
{10:"apple",4:"candy",3:"banana"}
## Checklist
- [x] Did you apply source code formatter, i.e., `mvn formatter:format`,
for your commit?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/takuti/incubator-hivemall topk-ordered-map
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hivemall/pull/108.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #108
----
commit 78403e8a3cb99b6bccdf2500ad5551d413345222
Author: Takuya Kitazawa <[email protected]>
Date: 2017-08-07T05:26:13Z
Fix typo
commit 46a23a2129ea74244e8a42b6aa5d9da9d5cf8ba1
Author: Takuya Kitazawa <[email protected]>
Date: 2017-08-07T07:14:42Z
Implement `to_bounded_ordered_map` UDAF
commit 3c029f9bd71adb70db8dfc48f6452362dacc164c
Author: Takuya Kitazawa <[email protected]>
Date: 2017-08-07T07:24:15Z
Throw an exception for invalid map size
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---