Re: Vector creation - out of memory error

Grant Ingersoll Mon, 20 Jul 2009 17:50:12 -0700



On Jul 20, 2009, at 2:40 PM, Florian Leibert wrote:

Hi,
I'm trying to create vectors with Mahout as explained in
http://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text,however I keep running out of heap. My heap is set to 2 GB alreadyand I use
these parameters:
"java org.apache.mahout.utils.vectors.Driver --dir /LUCENE/ind --output
/user/florian/index-vectors-01 --field content --dictOut
/user/florian/index-dict-01 --weight TF".

Hmm, 6GB isn't all that large, but the primary memory usage is goingto be due to the CachedTermInfo, which loads all the terms intomemory. This is an interface that can be implemented in other,slower, ways, but we'll have to change the Driver program to allow forthat.


How many unique terms do you have in the content field?

You have java -Xmx2000M set as the heap size?

My index currently is about 6 GB large. Is there any way to computethe
vectors in a distributed manner?


There will be, but there isn't yet, I suspect.

What's the largest index someone has
created vectors from?

It's pretty new code, I've only tested it on relatively small indexes(few 100 mgs) so far, but the only gating issue memory wise is theCachedTermInfo.

Sorry I don't have better answers, but I am willing to help improve.I will try to use some bigger indexes soon.


-Grant

Re: Vector creation - out of memory error

Reply via email to