Another thing is to limit the max # merge threads CMS will run at once. It defaults to 3 now.
Mike On Thu, Mar 26, 2009 at 2:08 PM, Jason Rutherglen <jason.rutherg...@gmail.com> wrote: > I used the NoMergePolicy to build the index as I noticed the indexing is > faster, meaning the system simply creates large multi-megabyte segments in > the ram buffer, flushes them out and doesn't worry about merging which > causes massive disk trashing. I am pondering some benchmarks to find the > optimal merge policy for realtime search, I'm not sure it's always necessary > to merge according to the Log system. > > For example, a merge policy that caps the size of each segment at 250 > megabytes, and does no merging could be interesting for realtime where many > deletes are coming in and the segments with enough deletes need to merged > away in 1-2 hours. Meaning optimizing may not be best as it requires later > large merges. Also an interleaving system that does not perform merges if a > flush is occurring could useful for minimizing disk trash. > > On Wed, Mar 25, 2009 at 3:39 PM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> LuceneError when executed should reproduce the failure. The >> contrib/benchmark libraries are required. MultiThreadDocAdd is a >> multithreaded indexing utility class. >> >> On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen < >> jason.rutherg...@gmail.com> wrote: >> >>> Each document is being created in a single thread, and the fields of the >>> document are not being updated elsewhere. I haven't posted the full code >>> yet as it needs to cleaned up. Thanks Mike! >>> >>> >>> On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>>> It looks like you are reusing a Field (the f.setValue(...) calls); are >>>> you sure you're not changing a Document/Field while another thread is >>>> adding it to the index? >>>> >>>> If you can post the full code, then I can try to run it on my >>>> wikipedia dump locally. >>>> >>>> Mike >>>> >>>> Jason Rutherglen <jason.rutherg...@gmail.com> wrote: >>>> > Mike, >>>> > >>>> > It only happens when at least 1 million documents are indexed in a >>>> > multithreaded fashion. Maybe I should post the code? I will try >>>> indexing >>>> > without the payload field, I assume it won't fail because I indexed >>>> > wikipedia before with no issues. >>>> > >>>> > Thanks! >>>> > >>>> > Jason >>>> > >>>> > On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless < >>>> > luc...@mikemccandless.com> wrote: >>>> > >>>> >> Hmmmm. >>>> >> >>>> >> Jason is this easily/compactly repeated? EG, try to index the N docs >>>> >> before that one. >>>> >> >>>> >> If you remove the SinglePayloadTokenStream field, does the exception >>>> >> still happen? >>>> >> >>>> >> Mike >>>> >> >>>> >> Jason Rutherglen <jason.rutherg...@gmail.com> wrote: >>>> >> > While indexing using >>>> >> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. >>>> The >>>> >> > asserion error is from >>>> TermsHashPerField.comparePostings(RawPostingList >>>> >> p1, >>>> >> > RawPostingList p2). A Payload is added to the document representing >>>> a >>>> >> UID. >>>> >> > Only 1-2 out of 1 million documents indexed generates this error. >>>> >> > >>>> >> > java.lang.AssertionError >>>> >> > problem adding >>>> >> > >>>> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia, >>>> >> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The >>>> '''Croatian >>>> >> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in >>>> >> [[Washington, >>>> >> > D.C.]] It is located on [[Embassy Row]] at 2343 [[Massachusetts >>>> Avenue >>>> >> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC >>>> >> > (northwest)|Northwest]] near [[Dupont Circle]]. Previously the >>>> building >>>> >> had >>>> >> > been home to the [[Austrian Embassy in Washington|Austrian >>>> embassy]], but >>>> >> > they left for larger quarters and sold the structure to Croatia in >>>> 1993. >>>> >> > The purchase and renovation of the building was largely paid for by >>>> the >>>> >> > [[Croatian-American]] community. In front of the embassy is a large >>>> >> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]]. >>>> >> > ==External link== *[http://www.croatiaemb.org/ Official site] >>>> >> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign >>>> relations >>>> >> of >>>> >> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of >>>> >> Croatia >>>> >> > in Washington> >>>> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006 >>>> >> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107> >>>> >> > >>>> >> >>>> indexed,tokenized<_ID:proj.zoie.api.zoieindexreader$singlepayloadtokenstr...@e7b3cf >>>> >> > >>>> >> > indexed<id:667162>> ex: java.lang.AssertionError >>>> >> > at >>>> >> > >>>> >> >>>> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228) >>>> >> > at >>>> >> > >>>> >> >>>> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144) >>>> >> > at >>>> >> > >>>> >> >>>> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136) >>>> >> > at >>>> >> > >>>> >> >>>> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51) >>>> >> > at >>>> >> > >>>> >> >>>> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202) >>>> >> > at >>>> >> > >>>> >> >>>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132) >>>> >> > at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145) >>>> >> > at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74) >>>> >> > at >>>> >> > >>>> >> >>>> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75) >>>> >> > at >>>> >> > >>>> >> >>>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60) >>>> >> > at >>>> >> > >>>> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574) >>>> >> > at >>>> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533) >>>> >> > at >>>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442) >>>> >> > at >>>> >> > >>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922) >>>> >> > at >>>> >> > >>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880) >>>> >> > >>>> >> >>>> >> --------------------------------------------------------------------- >>>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >> >>>> >> >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org