Hi,

Maybe your heap size is just too big, so your JVM spends too much time in GC? 
The setup you described in your last eMail ist the "official supported" setup 
:-) Lucene has no problem with that setup and can index. Be sure:
- Don't give too much heap to your indexing app. Larger heaps create much more 
GC load.
- Use a suitable Garbage collector (e.g. Java 7 G1 Collector or Java 6 CMS 
Collector). Other garbage collectors may do GCs in a single thread 
("stop-the-world").

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru]
> Sent: Saturday, November 23, 2013 4:46 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene multithreaded indexing problems
> 
> So we return to the initially described setup: multiple parallel workers, each
> making "parse + indexWriter.addDocument()" for single documents with no
> synchronization at my side. This setup was also bad on memory consumption
> and thread blocking, as I reported.
> 
> Or did I misunderstand you?
> 
> --
> Igor
> 
> 22.11.2013, 23:34, "Uwe Schindler" <u...@thetaphi.de>:
> > Hi,
> > Don't use addDocuments. This method is more made for so called block
> indexing (where all documents need to be on a block for block joins). Call
> addDocument for each document possibly from many threads.  By this
> Lucene can better handle multithreading and free memory early. There is
> really no need to use bulk adds, this is solely for block joins, where docs 
> need
> to be sequential and without gaps.
> >
> > Uwe
> >
> > Igor Shalyminov <ishalymi...@yandex-team.ru> schrieb:
> >
> >> - uwe@
> >>
> >> Thanks Uwe!
> >>
> >> I changed the logic so that my workers only parse input docs into
> >> Documents, and indexWriter does addDocuments() by itself for the
> >> chunks of 100 Documents.
> >> Unfortunately, this behaviour reproduces: memory usage slightly
> >> increases with the number of processed documents, and at some point
> >> the program runs very slowly, and it seems that only a single thread
> >> is active.
> >> It happens after lots of parse/index cycles.
> >>
> >> The current instance is now in the "single-thread" phase with ~100%
> >> CPU and with 8397M RES memory (limit for the VM is -Xmx8G).
> >> My question is, when does addDocuments() release all resourses passed
> >> in (the Documents themselves)?
> >> Are the resourses released after finishing the function call, or I
> >> have to do indexWriter.commit() after, say, each chunk?
> >>
> >> --
> >> Igor
> >>
> >> 21.11.2013, 19:59, "Uwe Schindler" <u...@thetaphi.de>:
> >>>  Hi,
> >>>
> >>>  why are you doing this? Lucene's IndexWriter can handle
> >>> addDocuments
> >> in multiple threads. And, since Lucene 4, it will process them almost
> >> completely parallel!
> >>>  If you do the addDocuments single-threaded you are adding an
> >> additional bottleneck in your application. If you are doing a
> >> synchronization on IndexWriter (which I hope you will not do), things
> >> will go wrong, too.
> >>>  Uwe
> >>>
> >>>  -----
> >>>  Uwe Schindler
> >>>  H.-H.-Meier-Allee 63, D-28213 Bremen
> >>>  http://www.thetaphi.de
> >>>  eMail: u...@thetaphi.de
> >>>>   -----Original Message-----
> >>>>   From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru]
> >>>>   Sent: Thursday, November 21, 2013 4:45 PM
> >>>>   To: java-user@lucene.apache.org
> >>>>   Subject: Lucene multithreaded indexing problems
> >>>>
> >>>>   Hello!
> >>>>
> >>>>   I tried to perform indexing multithreadedly, with a
> >>>> FixedThreadPool
> >> of
> >>>>   Callable workers.
> >>>>   The main operation - parsing a single document and addDocument()
> >>>> to
> >> the
> >>>>   index - is done by a single worker.
> >>>>   After parsing a document, a lot (really a lot) of Strings
> >>>> appears,
> >> and at the
> >>>>   end of the worker's call() all of them goes to the indexWriter.
> >>>>   I use no merging, the resourses are flushed on disk when the
> >> segment size
> >>>>   limit is reached.
> >>>>
> >>>>   The problem is, after a little while (when the most of the heap
> >> memory is
> >>>>   used) indexer makes no progress, and CPU load is constant 100%
> >>>> (no
> >>>>   difference if there are 2 threads or 32). So I think at some
> >>>> point
> >> garbage
> >>>>   collection takes the whole indexing process down.
> >>>>
> >>>>   Could you please give some advices on the proper concurrent
> >> indexing with
> >>>>   Lucene?
> >>>>   Can there be "memory leaks" somewhere in the indexWriter? Maybe
> I
> >> must
> >>>>   perform some operations with writer to release unused resourses
> >> from time
> >>>>   to time?
> >>>>
> >>>>   --
> >>>>   Best Regards,
> >>>>   Igor
> >>
> >> ---------------------------------------------------------------------
> >>>>   To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>>>   For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>>  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>>  For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> > --
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, 28213 Bremen
> > http://www.thetaphi.de
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to