[jira] [Commented] (SPARK-2511) Add TF-IDF featurizer

Michael Yannakopoulos (JIRA) Sun, 20 Jul 2014 20:40:15 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068159#comment-14068159
 ]


Michael Yannakopoulos commented on SPARK-2511:
----------------------------------------------

I am really interested in this topic. I have seen the video from Databricks 
Cloud
demo and they actually use MLLIB in order to generate the TF-IDF model. 
Afterwards, they use the already trained model in order to find similarities 
between
words that derive from tweets in a streaming environment.

Is this TF-IDF implementation actually provided somewhere (as source code) or
do we need to provide our own implementation? I am willing to help in the 
development of this new model/functionality.

Thanks,
Michael

> Add TF-IDF featurizer
> ---------------------
>
>                 Key: SPARK-2511
>                 URL: https://issues.apache.org/jira/browse/SPARK-2511
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> Port the TF-IDF implementation that was used in the Databricks Cloud demo to 
> MLlib.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2511) Add TF-IDF featurizer

Reply via email to