[ 
https://issues.apache.org/jira/browse/FLINK-31189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Hong updated FLINK-31189:
-----------------------------
    Description: 
In real-world datasets, categorical features may have millions of distinct 
values, while some of them may only occur few times. Special handling of less 
frequent values can bring performance increase in some algorithms.

 

One  

> Allow ignore less frequent values in StringIndexer
> --------------------------------------------------
>
>                 Key: FLINK-31189
>                 URL: https://issues.apache.org/jira/browse/FLINK-31189
>             Project: Flink
>          Issue Type: Improvement
>          Components: Library / Machine Learning
>            Reporter: Fan Hong
>            Priority: Major
>
> In real-world datasets, categorical features may have millions of distinct 
> values, while some of them may only occur few times. Special handling of less 
> frequent values can bring performance increase in some algorithms.
>  
> One  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to