[
https://issues.apache.org/jira/browse/MAHOUT-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984801#comment-13984801
]
Drew Farris commented on MAHOUT-1252:
-------------------------------------
Ok. Are we preferring the Scala or Java Spark APIs moving forward?
As far as word2vec - I haven't worked with it directly but it looks very
interesting (Sumeet Vij & colleagues presented on this at BigConf.io).
The functionality would be great to have as a part of the Mahout tooling. Radim
Řehůřek has written about his experiences porting word2vec to python/gensim, so
his writings at
http://radimrehurek.com/2013/09/deep-learning-with-word2vec-and-gensim/ (also
parts 2 and 3) will be useful as a reference implementation.
I think that providing basic tf & tf/idf bag-of-words vectorization will be
useful and may be more straightforward to implement in the short term. That
said, I have no sense of the complexity of a word2vec port at this point in
time.
> Add support for Finite State Transducers (FST) as a DictionaryType.
> -------------------------------------------------------------------
>
> Key: MAHOUT-1252
> URL: https://issues.apache.org/jira/browse/MAHOUT-1252
> Project: Mahout
> Issue Type: Improvement
> Components: Integration
> Affects Versions: 0.7
> Reporter: Suneel Marthi
> Assignee: Suneel Marthi
> Fix For: 1.0
>
>
> Add support for Finite State Transducers (FST) as a DictionaryType, this
> should result in an order of magnitude speedup of seq2sparse.
--
This message was sent by Atlassian JIRA
(v6.2#6252)