I am running over a 100 million row nosql set and unfortunately building 1
million indexes. Each row I get may or may not be for the index I just wrote
too so I can't keep IndexWriter open very long. I am currently simulating how
long it would take me to build all the indexes and it looks like it is
somewhere around 17 hours :(
Any other ways to optimize this code(and then I can maybe apply it to our index
map/reduce job), thanks, Dean This is done in 20 different threads and again
taking IndexWriter out of the loop is probably not an option since as I go over
the 100 million records each one needs a different IndexWriter and I can't have
too many IndexWriters open.
Directory dir = FSDirectory.open(new File(INDEX_DIR_PREFIX
+ this.account));
for (int i = 0; i < 125; i++) {
IndexWriterConfig conf = new IndexWriterConfig(
Version.LUCENE_32, new KeywordAnalyzer());
IndexWriter writer = new IndexWriter(dir, conf);
LocalDate date = new LocalDate();
int random = this.r.nextInt(1000);
date = date.plusDays(random);
int next = this.r.nextInt(5000);
int name = this.r.nextInt(1000);
Document document = createDocument(("temp" + next),
("dean" + name),
"some url", date);
writer.addDocument(document);
writer.close();
}
Hmmmm, I maybe could use a IndexWriter cache of 2000 to leave them open until
evicted? I can't think of anything else to help though. Ideas?
Thanks,
Dean
This message and any attachments are intended only for the use of the addressee
and
may contain information that is privileged and confidential. If the reader of
the
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.