On Mon, Jul 12, 2010 at 11:14 AM, sarfaraz masood < sarfarazmasood2...@yahoo.com> wrote:
> 1) Reuse field & document objects to reduce the GC overhead using the > field.setValue() method.. By doing this, instead of speeding up, the > indexing speed reduced drastically. i know this is unusual but thats what > happened. > GC overhead is much, much less on recent JVM's such as you are using. It still pays very large benefits to avoid *copying*, but it rarely pays to avoid allocating. You should look at the new TokenStream API. > > 2) Tuning parameters by setMergeFactor(), setMaxBufferedDocs(). > now the default value for both is 10.. i increased the value to 1000.. by > doing so the no of .CSF file in the index folder increased many folds.. and > i got java.io.IOException : Too Many Files Open. > Have you set this limit to the maximum possible? It is common for the default limit to be unreasonably small. so where am i going wrong ?? how to overcome these problems..how to speed up > my indexing process.. > > Another thing you might investigate is indexing on multiple machines in anticipation of doing sharded search using Solr Cloud or Katta. That will have the largest impact on total index time of any change that you can do relatively easily.