Another thing is to limit the max # merge threads CMS will run at
once.  It defaults to 3 now.

Mike

On Thu, Mar 26, 2009 at 2:08 PM, Jason Rutherglen
<jason.rutherg...@gmail.com> wrote:
> I used the NoMergePolicy to build the index as I noticed the indexing is
> faster, meaning the system simply creates large multi-megabyte segments in
> the ram buffer, flushes them out and doesn't worry about merging which
> causes massive disk trashing.  I am pondering some benchmarks to find the
> optimal merge policy for realtime search, I'm not sure it's always necessary
> to merge according to the Log system.
>
> For example, a merge policy that caps the size of each segment at 250
> megabytes, and does no merging could be interesting for realtime where many
> deletes are coming in and the segments with enough deletes need to merged
> away in 1-2 hours.  Meaning optimizing may not be best as it requires later
> large merges.  Also an interleaving system that does not perform merges if a
> flush is occurring could useful for minimizing disk trash.
>
> On Wed, Mar 25, 2009 at 3:39 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> LuceneError when executed should reproduce the failure.  The
>> contrib/benchmark libraries are required.  MultiThreadDocAdd is a
>> multithreaded indexing utility class.
>>
>> On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen <
>> jason.rutherg...@gmail.com> wrote:
>>
>>> Each document is being created in a single thread, and the fields of the
>>> document are not being updated elsewhere.  I haven't posted the full code
>>> yet as it needs to cleaned up.  Thanks Mike!
>>>
>>>
>>> On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>> It looks like you are reusing a Field (the f.setValue(...) calls); are
>>>> you sure you're not changing a Document/Field while another thread is
>>>> adding it to the index?
>>>>
>>>> If you can post the full code, then I can try to run it on my
>>>> wikipedia dump locally.
>>>>
>>>> Mike
>>>>
>>>> Jason Rutherglen <jason.rutherg...@gmail.com> wrote:
>>>> > Mike,
>>>> >
>>>> > It only happens when at least 1 million documents are indexed in a
>>>> > multithreaded fashion.  Maybe I should post the code?  I will try
>>>> indexing
>>>> > without the payload field, I assume it won't fail because I indexed
>>>> > wikipedia before with no issues.
>>>> >
>>>> > Thanks!
>>>> >
>>>> > Jason
>>>> >
>>>> > On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
>>>> > luc...@mikemccandless.com> wrote:
>>>> >
>>>> >> Hmmmm.
>>>> >>
>>>> >> Jason is this easily/compactly repeated?  EG, try to index the N docs
>>>> >> before that one.
>>>> >>
>>>> >> If you remove the SinglePayloadTokenStream field, does the exception
>>>> >> still happen?
>>>> >>
>>>> >> Mike
>>>> >>
>>>> >> Jason Rutherglen <jason.rutherg...@gmail.com> wrote:
>>>> >> > While indexing using
>>>> >> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.
>>>>  The
>>>> >> > asserion error is from
>>>> TermsHashPerField.comparePostings(RawPostingList
>>>> >> p1,
>>>> >> > RawPostingList p2).  A Payload is added to the document representing
>>>> a
>>>> >> UID.
>>>> >> > Only 1-2 out of 1 million documents indexed generates this error.
>>>> >> >
>>>> >> > java.lang.AssertionError
>>>> >> > problem adding
>>>> >> >
>>>> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
>>>> >> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The
>>>> '''Croatian
>>>> >> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
>>>> >> [[Washington,
>>>> >> > D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts
>>>> Avenue
>>>> >> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
>>>> >> > (northwest)|Northwest]] near [[Dupont Circle]].  Previously the
>>>> building
>>>> >> had
>>>> >> > been home to the [[Austrian Embassy in Washington|Austrian
>>>> embassy]], but
>>>> >> > they left for larger quarters and sold the structure to Croatia in
>>>> 1993.
>>>> >> > The purchase and renovation of the building was largely paid for by
>>>> the
>>>> >> > [[Croatian-American]] community.  In front of the embassy is a large
>>>> >> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
>>>> >> > ==External link== *[http://www.croatiaemb.org/ Official site]
>>>> >> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
>>>> relations
>>>> >> of
>>>> >> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
>>>> >> Croatia
>>>> >> > in Washington>
>>>> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
>>>> >> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
>>>> >> >
>>>> >>
>>>> indexed,tokenized<_ID:proj.zoie.api.zoieindexreader$singlepayloadtokenstr...@e7b3cf
>>>> >> >
>>>> >> > indexed<id:667162>> ex: java.lang.AssertionError
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
>>>> >> >    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
>>>> >> >    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
>>>> >> >    at
>>>> >> >
>>>> >>
>>>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
>>>> >> >    at
>>>> >> >
>>>> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
>>>> >> >    at
>>>> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
>>>> >> >    at
>>>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
>>>> >> >    at
>>>> >> >
>>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
>>>> >> >    at
>>>> >> >
>>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
>>>> >> >
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>> >>
>>>> >>
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to