[
https://issues.apache.org/jira/browse/HIVEMALL-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874311#comment-16874311
]
ASF GitHub Bot commented on HIVEMALL-259:
-----------------------------------------
myui commented on pull request #195: [HIVEMALL-259][DOC] Refactor
feature_binning UDF
URL: https://github.com/apache/incubator-hivemall/pull/195
## What changes were proposed in this pull request?
Refactor feature_binning UDF and update the function usage
## What type of PR is it?
Documentation, Refactoring
## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-259
## How was this patch tested?
unit tests, manual tests on EMR
## How to use this feature?
```
WITH extracted as (
select
extract_feature(feature) as index,
extract_weight(feature) as value
from
input l
LATERAL VIEW explode(features) r as feature
),
mapping as (
select
index,
build_bins(value, 5, true) as quantiles -- 5 bins with auto bin shrinking
from
extracted
group by
index
),
bins as (
select
to_map(index, quantiles) as quantiles
from
mapping
)
select
l.features as original,
feature_binning(l.features, r.quantiles) as features
from
input l
cross join bins r
```
see https://gist.github.com/myui/f943fa3ce1a7e1ac3f2dd9a7f9fa703b
## Checklist
(Please remove this section if not needed; check `x` for YES, blank for NO)
- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for
your commit?
- [x] Did you run system tests on Hive (or Spark)?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [BUG] feature_binning does not work properly under certain conditions
> ---------------------------------------------------------------------
>
> Key: HIVEMALL-259
> URL: https://issues.apache.org/jira/browse/HIVEMALL-259
> Project: Hivemall
> Issue Type: Improvement
> Affects Versions: 0.5.2
> Reporter: Makoto Yui
> Assignee: Makoto Yui
> Priority: Trivial
> Fix For: 0.6.0
>
>
>
> feature_binning does not properly work in certain condition.
> It might be a bug in quantiles lookup by a different key type object at [this
> line|[https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/ftvec/binning/FeatureBinningUDF.java#L133]].
>
> {code:java}
> WITH extracted as (
> select
> extract_feature(feature) as index,
> extract_weight(feature) as value
> from
> input l
> LATERAL VIEW explode(features) r as feature
> ),
> bins as (
> select
> map(index, build_bins(value, 5, true)) as quantiles -- 5 bins with auto
> bin shrinking
> from
> extracted
> group by
> index
> )
> select
> l.features as original,
> feature_binning(l.features, r.quantiles) as features
> from
> input l
> cross join bins r
> ;
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)