I select it in parts, chunks of 5000 records with the limit keyword.. the thing is it starts very fast..but then slows down so i doubt it has to do with tokenizing
2006/7/3, Aleksander M. Stensby <[EMAIL PROTECTED]>:
My guess is if that you actually do a complete select * from you db, and manage all objects all at once, this will be a problem for your jvm, maybe running out of memory is the problem you encounter, strings tend to be a bit of a memory issue in java :( My suggestion is that you do paginating and offsetting while getting from db (and indexing the results). so.. you could managed your "last indexed" doc, predefine a "step" variable, then select with a limit of lastindexed, step, then updating your last indexed variable for each iteration. Of course the step variable should not be too low either, because this will demand to much load on the database connection. Also, you might reconsider tokenizing all fields... is it neccessary? You have to store all of them? (i dont know the usage of your index, so its a bit hard to know for sure) - Aleksander On Mon, 03 Jul 2006 10:52:15 +0200, peter velthuis <[EMAIL PROTECTED]> wrote: > When i start the program its fast.. about 10 docs per second. but > after about 15000 it slows down very much. Now it does 1 doc per > second and it is at nr# 40 000 after a whole night indexing. These are > VERY small docs with very little information.. THis is what and how i > index it: > > Document doc = new Document(); > doc.add(new Field("field1", field1, > Field.Store.YES, > Field.Index.TOKENIZED)); > doc.add(new Field("field2", field2, > Field.Store.YES, > Field.Index.TOKENIZED)); > doc.add(new Field("field3", field3, > Field.Store.YES, > Field.Index.TOKENIZED)); > doc.add(new Field("field4", field4, > Field.Store.YES, > Field.Index.TOKENIZED)); > doc.add(new Field("field5", field5, > Field.Store.YES, > Field.Index.TOKENIZED)); > doc.add(new Field("field6", field6, > Field.Store.YES, > Field.Index.TOKENIZED)); > doc.add(new Field("contents", > contents, Field.Store.NO, > Field.Index.TOKENIZED)); > > > > and this: > > > String indexDirectory = "lucdex2"; > > private void indexDocument(Document document) throws Exception { > Analyzer analyzer = new StandardAnalyzer(); > IndexWriter writer = new IndexWriter(indexDirectory, analyzer, > false); > // writer.setUseCompoundFile(true); > writer.addDocument(document); > writer.optimize(); > writer.close(); > > > > I read the data out mysql database.. but that cant be the problem.. > since data is in memory. > > Also i use cygwin, when i try indexing on windows in a program like > netbeans or BlueJ it crashes windows after about 5000 docs. it sais > "beep" and a complete shutdown... > > Peter > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Aleksander M. Stensby Software Developer Integrasco A/S [EMAIL PROTECTED] Tlf.: +47 41 22 82 72 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]