GitHub user ChristophAl opened a pull request: https://github.com/apache/flink/pull/665
[FLINK-1735] Feature Hasher The prototype of the feature hasher. - The implementation is based on the scikit-learn feature hasher - Test vectors have been generated by scikit-learn as well - Currently the implementation only works on Seq[String] You can merge this pull request into a Git repository by running: $ git pull https://github.com/ChristophAl/flink FLINK-1735_FeatureHasher Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/665.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #665 ---- commit e5ad7e842f443dd4b15fe21f3d1d89c238c882d1 Author: Christoph Alt <christoph....@posteo.de> Date: 2015-05-06T22:10:24Z Initial commit Issue #1735 commit 1e9312fdc46b741faea6bdfb26fc4ce359cd1cfa Author: Christoph Alt <christoph....@posteo.de> Date: 2015-05-08T13:54:53Z Added basic testcase for FeatureHasher commit a0c6ee6251edc4d0e556ba98886a783a072bd27b Author: Christoph Alt <christoph....@posteo.de> Date: 2015-05-08T13:58:59Z FeatureHasher prototype - Added a prototype of Feature Hasher, currently accepts Seq[String] only commit c55eb11fa21943dd8451256755bc707a59c3f5d3 Author: Christoph Alt <christoph....@posteo.de> Date: 2015-05-08T14:09:48Z Corrected typos commit 7002ab9e18a6cca5b55d700967accb375538faad Author: Christoph Alt <christoph....@posteo.de> Date: 2015-05-09T14:25:42Z Moved Featurehasher to feature.extraction package commit 15b868f08806b375fff564f851f668122d363457 Author: Christoph Alt <christoph....@posteo.de> Date: 2015-05-09T14:31:19Z Readded FeatureHasher.scala commit 38e0650ebdec305c4a51e788699da0809a3b1973 Author: Christoph Alt <christoph....@posteo.de> Date: 2015-05-09T18:36:00Z Reformated test vectors ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---