[ 
https://issues.apache.org/jira/browse/HIVEMALL-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874311#comment-16874311
 ] 

ASF GitHub Bot commented on HIVEMALL-259:
-----------------------------------------

myui commented on pull request #195: [HIVEMALL-259][DOC] Refactor 
feature_binning UDF
URL: https://github.com/apache/incubator-hivemall/pull/195
 
 
   ## What changes were proposed in this pull request?
   
   Refactor feature_binning UDF and update the function usage
   
   ## What type of PR is it?
   
   Documentation, Refactoring
   
   ## What is the Jira issue?
   
   https://issues.apache.org/jira/browse/HIVEMALL-259
   
   ## How was this patch tested?
   
   unit tests, manual tests on EMR
   
   ## How to use this feature?
   
   ```
   WITH extracted as (
     select 
       extract_feature(feature) as index,
       extract_weight(feature) as value
     from
       input l
       LATERAL VIEW explode(features) r as feature
   ),
   mapping as (
     select
       index, 
       build_bins(value, 5, true) as quantiles -- 5 bins with auto bin shrinking
     from
       extracted
     group by
       index
   ),
   bins as (
      select 
       to_map(index, quantiles) as quantiles 
      from
       mapping
   )
   select
     l.features as original,
     feature_binning(l.features, r.quantiles) as features
   from
     input l
     cross join bins r
   ```
   
   see https://gist.github.com/myui/f943fa3ce1a7e1ac3f2dd9a7f9fa703b
   
   ## Checklist
   
   (Please remove this section if not needed; check `x` for YES, blank for NO)
   
   - [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for 
your commit?
   - [x] Did you run system tests on Hive (or Spark)?
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [BUG] feature_binning does not work properly under certain conditions
> ---------------------------------------------------------------------
>
>                 Key: HIVEMALL-259
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-259
>             Project: Hivemall
>          Issue Type: Improvement
>    Affects Versions: 0.5.2
>            Reporter: Makoto Yui
>            Assignee: Makoto Yui
>            Priority: Trivial
>             Fix For: 0.6.0
>
>
>  
> feature_binning does not properly work in certain condition.
> It might be a bug in quantiles lookup by a different key type object at [this 
> line|[https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/ftvec/binning/FeatureBinningUDF.java#L133]].
>  
> {code:java}
> WITH extracted as (
>   select
>     extract_feature(feature) as index,
>     extract_weight(feature) as value
>   from
>     input l
>     LATERAL VIEW explode(features) r as feature
> ),
> bins as (
>    select
>      map(index, build_bins(value, 5, true)) as quantiles -- 5 bins with auto 
> bin shrinking
>    from
>      extracted
>    group by
>      index
> )
> select
>   l.features as original,
>   feature_binning(l.features, r.quantiles) as features
> from
>   input l
>   cross join bins r
> ;
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to