Hi, Maybe your heap size is just too big, so your JVM spends too much time in GC? The setup you described in your last eMail ist the "official supported" setup :-) Lucene has no problem with that setup and can index. Be sure: - Don't give too much heap to your indexing app. Larger heaps create much more GC load. - Use a suitable Garbage collector (e.g. Java 7 G1 Collector or Java 6 CMS Collector). Other garbage collectors may do GCs in a single thread ("stop-the-world").
Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru] > Sent: Saturday, November 23, 2013 4:46 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene multithreaded indexing problems > > So we return to the initially described setup: multiple parallel workers, each > making "parse + indexWriter.addDocument()" for single documents with no > synchronization at my side. This setup was also bad on memory consumption > and thread blocking, as I reported. > > Or did I misunderstand you? > > -- > Igor > > 22.11.2013, 23:34, "Uwe Schindler" <u...@thetaphi.de>: > > Hi, > > Don't use addDocuments. This method is more made for so called block > indexing (where all documents need to be on a block for block joins). Call > addDocument for each document possibly from many threads. By this > Lucene can better handle multithreading and free memory early. There is > really no need to use bulk adds, this is solely for block joins, where docs > need > to be sequential and without gaps. > > > > Uwe > > > > Igor Shalyminov <ishalymi...@yandex-team.ru> schrieb: > > > >> - uwe@ > >> > >> Thanks Uwe! > >> > >> I changed the logic so that my workers only parse input docs into > >> Documents, and indexWriter does addDocuments() by itself for the > >> chunks of 100 Documents. > >> Unfortunately, this behaviour reproduces: memory usage slightly > >> increases with the number of processed documents, and at some point > >> the program runs very slowly, and it seems that only a single thread > >> is active. > >> It happens after lots of parse/index cycles. > >> > >> The current instance is now in the "single-thread" phase with ~100% > >> CPU and with 8397M RES memory (limit for the VM is -Xmx8G). > >> My question is, when does addDocuments() release all resourses passed > >> in (the Documents themselves)? > >> Are the resourses released after finishing the function call, or I > >> have to do indexWriter.commit() after, say, each chunk? > >> > >> -- > >> Igor > >> > >> 21.11.2013, 19:59, "Uwe Schindler" <u...@thetaphi.de>: > >>> Hi, > >>> > >>> why are you doing this? Lucene's IndexWriter can handle > >>> addDocuments > >> in multiple threads. And, since Lucene 4, it will process them almost > >> completely parallel! > >>> If you do the addDocuments single-threaded you are adding an > >> additional bottleneck in your application. If you are doing a > >> synchronization on IndexWriter (which I hope you will not do), things > >> will go wrong, too. > >>> Uwe > >>> > >>> ----- > >>> Uwe Schindler > >>> H.-H.-Meier-Allee 63, D-28213 Bremen > >>> http://www.thetaphi.de > >>> eMail: u...@thetaphi.de > >>>> -----Original Message----- > >>>> From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru] > >>>> Sent: Thursday, November 21, 2013 4:45 PM > >>>> To: java-user@lucene.apache.org > >>>> Subject: Lucene multithreaded indexing problems > >>>> > >>>> Hello! > >>>> > >>>> I tried to perform indexing multithreadedly, with a > >>>> FixedThreadPool > >> of > >>>> Callable workers. > >>>> The main operation - parsing a single document and addDocument() > >>>> to > >> the > >>>> index - is done by a single worker. > >>>> After parsing a document, a lot (really a lot) of Strings > >>>> appears, > >> and at the > >>>> end of the worker's call() all of them goes to the indexWriter. > >>>> I use no merging, the resourses are flushed on disk when the > >> segment size > >>>> limit is reached. > >>>> > >>>> The problem is, after a little while (when the most of the heap > >> memory is > >>>> used) indexer makes no progress, and CPU load is constant 100% > >>>> (no > >>>> difference if there are 2 threads or 32). So I think at some > >>>> point > >> garbage > >>>> collection takes the whole indexing process down. > >>>> > >>>> Could you please give some advices on the proper concurrent > >> indexing with > >>>> Lucene? > >>>> Can there be "memory leaks" somewhere in the indexWriter? Maybe > I > >> must > >>>> perform some operations with writer to release unused resourses > >> from time > >>>> to time? > >>>> > >>>> -- > >>>> Best Regards, > >>>> Igor > >> > >> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > >>> -------------------------------------------------------------------- > >>> - > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > -- > > Uwe Schindler > > H.-H.-Meier-Allee 63, 28213 Bremen > > http://www.thetaphi.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org