Hi, This is my first post to the Hadoop list and have not yet written a program using the framework.
I'm querying several large Lucene indexes, and generating about 30 text files 1-3MB each. These files contain metadata about the indexed documents and a corresponding MD5 key. This unique key exists for each document within the Lucene index and matches those in the metadata. My current solution is to read in about 50-80MB of text into memory run some routines and generate double ranking weights for each document separate and complementary to Lucene scoring (ratings). Then reassemble docs including the new fields by id. It works now, but the JVM approaches 1GB of resident private memory so it isn't scalable. My goal is move this into a Map Reduce but I don't yet know how. ;) What steps are required to turn several java methods and data sets into a SequenceFile? Kind Regards, Peter W.
