Re: LDA runtimes

Grant Ingersoll Wed, 23 Sep 2009 04:01:49 -0700


On Sep 23, 2009, at 6:05 AM, Levy, Mark wrote:

I've started to experiment with LDA and am finding that it createsonly
a single long-running map task for each iteration, which doesn't scale
well. The map is taking 20mins for 10k of my input SparseVectors,and 5
hours for 100k (the vocabulary size also grows when there are more
vectors).
Is this expected or am I doing something wrong? Are there anyexisting
performance benchmarks?

That's pretty new code, so I doubt there is much for benchmarks. Ifyou can share your vectors (the serialized ones, not the originalswith text) than we can profile and look into it a bit more.

Also, you may want to look at MAHOUT-165 in JIRA, as there are someperformance improvements for sparse vector using primitives.



--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: LDA runtimes

Reply via email to