Lucene index creation using Hadoop

bhushan_mahale Thu, 09 Jul 2009 08:27:15 -0700

Hi,

I am trying to create lucene indexes using the 
"contrib/index/hadoop-0.19.1-index.jar" provided by Hadoop.
Since it can be executed in map-reduced manner, I expect it to process large 
data very fast.
It processes small amount of data (< 5MB) very quickly.


Now 5 GB of input data is provided; and the fun starts :)

It goes out of memory. I increased the parameter "mapred.child.java.opts" in 
the file "hadoop-default.xml" to -Xmx1000m.
The processing went smoothly for 1.5 hours, completing 30% job.
Then master node hung.

Is there any way to get the ""contrib/index/hadoop-0.19.1-index.jar"" get going?
Is there any memory leak in the jar?

Can you suggest some alternatives?

Thanks,
- Bhushan

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Lucene index creation using Hadoop

Reply via email to