[
https://issues.apache.org/jira/browse/FLINK-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chesnay Schepler closed FLINK-1735.
-----------------------------------
Resolution: Won't Fix
Closing since flink-ml is effectively frozen.
> Add FeatureHasher to machine learning library
> ---------------------------------------------
>
> Key: FLINK-1735
> URL: https://issues.apache.org/jira/browse/FLINK-1735
> Project: Flink
> Issue Type: New Feature
> Components: Library / Machine Learning
> Reporter: Till Rohrmann
> Assignee: Felix Neutatz
> Priority: Major
> Labels: ML, pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Using the hashing trick [1,2] is a common way to vectorize arbitrary feature
> values. The hash of the feature value is used to calculate its index for a
> vector entry. In order to mitigate possible collisions, a second hashing
> function is used to calculate the sign for the update value which is added to
> the vector entry. This way, it is likely that collision will simply cancel
> out.
> A feature hasher would also be helpful for NLP problems where it could be
> used to vectorize bag of words or ngrams feature vectors.
> Resources:
> [1] [https://en.wikipedia.org/wiki/Feature_hashing]
> [2]
> [http://scikit-learn.org/stable/modules/feature_extraction.html#feature-extraction]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)