[ https://issues.apache.org/jira/browse/FLINK-31189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhipeng Zhang reassigned FLINK-31189: ------------------------------------- Assignee: Zhipeng Zhang > Allow special handle of less frequent values in StringIndexer > ------------------------------------------------------------- > > Key: FLINK-31189 > URL: https://issues.apache.org/jira/browse/FLINK-31189 > Project: Flink > Issue Type: Improvement > Components: Library / Machine Learning > Reporter: Fan Hong > Assignee: Zhipeng Zhang > Priority: Major > Labels: pull-request-available > > Real-world datasets often contain categorical features with millions of > distinct values, some of which may only appear a few times. To maximize the > performance of certain algorithms, it is important to treat these less > frequent values properly. A popular approach is to put them to a special > index, as is done in sklearn's OneHotEncoder [1]. > > [1] > https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html -- This message was sent by Atlassian Jira (v8.20.10#820010)