[ https://issues.apache.org/jira/browse/SPARK-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068159#comment-14068159 ]
Michael Yannakopoulos commented on SPARK-2511: ---------------------------------------------- I am really interested in this topic. I have seen the video from Databricks Cloud demo and they actually use MLLIB in order to generate the TF-IDF model. Afterwards, they use the already trained model in order to find similarities between words that derive from tweets in a streaming environment. Is this TF-IDF implementation actually provided somewhere (as source code) or do we need to provide our own implementation? I am willing to help in the development of this new model/functionality. Thanks, Michael > Add TF-IDF featurizer > --------------------- > > Key: SPARK-2511 > URL: https://issues.apache.org/jira/browse/SPARK-2511 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > > Port the TF-IDF implementation that was used in the Databricks Cloud demo to > MLlib. -- This message was sent by Atlassian JIRA (v6.2#6252)