Re: Throughput doesn't increase when using more concurrent threads

Chris Lamprecht Fri, 10 Mar 2006 14:00:50 -0800

Peter,

I think this is similar to the patch in this bugzilla task:


http://issues.apache.org/bugzilla/show_bug.cgi?id=35838
the patch itself is http://issues.apache.org/bugzilla/attachment.cgi?id=15757

(BTW does JIRA have a way to display the patch diffs?)

The above patch also has a change to SegmentReader to avoid
synchronization on isDeleted().  However, with that patch, you no
longer have the guarantee that one thread will immediately see
deletions by another thread.  This was fine for my purposes, and
resulted in a big performance boost when there were deleted documents,
but it may not be "correct" for others' needs.

-chris
On 3/10/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> > 3. Use the ThreadLocal's FieldReader in the document() method.
>
> As I understand it, this means that the document method no longer needs to
> be synchronized, right?
>
> I've made these changes and it does appear to improve performance. Random
> snapshots of the stack traces show only an occasional lock in 'isDeleted'.
> Mostly, though, the threads are busy scoring and adding results to priority
> queues, which is great. I've included some sample stacks, below. I'll report
> the new query rates after it has run for at least overnight, and I'd be
> happy submit these changes to the lucene committers, if interested.
>
> Peter
>
>
> Sample stack traces:
>
> "QueryThread group 1,#8" prio=1 tid=0x0000002ce48eeb80 nid=0x6b87 runnable
> [0x0000000043887000..0x0000000043887bb0]
>     at org.apache.lucene.search.FieldSortedHitQueue.lessThan(
> FieldSortedHitQueue.java:108)
>     at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:61)
>     at org.apache.lucene.search.FieldSortedHitQueue.insert(
> FieldSortedHitQueue.java:85)
>     at org.apache.lucene.search.FieldSortedHitQueue.insert(
> FieldSortedHitQueue.java:92)
>     at org.apache.lucene.search.TopFieldDocCollector.collect(
> TopFieldDocCollector.java:51)
>     at org.apache.lucene.search.TermScorer.score(TermScorer.java:75)
>     at org.apache.lucene.search.TermScorer.score(TermScorer.java:60)
>     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
>     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
>     at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225)
>     at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
>     at org.apache.lucene.search.Hits.<init>(Hits.java:52)
>     at org.apache.lucene.search.Searcher.search(Searcher.java:62)
>
> "QueryThread group 1,#5" prio=1 tid=0x0000002ce4d659f0 nid=0x6b84 runnable
> [0x0000000043584000..0x0000000043584d30]
>     at org.apache.lucene.search.TermScorer.score(TermScorer.java:75)
>     at org.apache.lucene.search.TermScorer.score(TermScorer.java:60)
>     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
>     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
>     at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225)
>     at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
>     at org.apache.lucene.search.Hits.<init>(Hits.java:52)
>     at org.apache.lucene.search.Searcher.search(Searcher.java:62)
>
> "QueryThread group 1,#4" prio=1 tid=0x0000002ce10afd50 nid=0x6b83 runnable
> [0x0000000043483000..0x0000000043483db0]
>     at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte(
> MMapDirectory.java:46)
>     at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:56)
>     at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java
> :101)
>     at org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java
> :194)
>     at org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:144)
>     at org.apache.lucene.search.ConjunctionScorer.doNext(
> ConjunctionScorer.java:56)
>     at org.apache.lucene.search.ConjunctionScorer.next(
> ConjunctionScorer.java:51)
>     at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
> :290)
>     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
>     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
>     at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225)
>     at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
>     at org.apache.lucene.search.Hits.<init>(Hits.java:52)
>     at org.apache.lucene.search.Searcher.search(Searcher.java:62)
>
> "QueryThread group 1,#3" prio=1 tid=0x0000002ce48959f0 nid=0x6b82 runnable
> [0x0000000043382000..0x0000000043382e30]
>     at java.util.LinkedList.listIterator(LinkedList.java:523)
>     at java.util.AbstractList.listIterator(AbstractList.java:349)
>     at java.util.AbstractSequentialList.iterator(AbstractSequentialList.java
> :250)
>     at org.apache.lucene.search.ConjunctionScorer.score(
> ConjunctionScorer.java:80)
>     at org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java
> :186)
>     at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
> :327)
>     at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
> :291)
>     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
>     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
>     at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225)
>     at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
>     at org.apache.lucene.search.Hits.<init>(Hits.java:52)
>     at org.apache.lucene.search.Searcher.search(Searcher.java:62)
>
>
> On 3/7/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
> >
> > Peter Keegan wrote:
> > > I ran a query performance tester against 8-cpu and 16-cpu Xeon servers
> > > (16/32 cpu hyperthreaded). on Linux. Here are the results:
> > >
> > > 8-cpu:  275 qps
> > > 16-cpu: 305 qps
> > > (the dual-core Opteron servers are still faster)
> > >
> > > Here is the stack trace of 8 of the 16 query threads during the test:
> > >
> > >         at org.apache.lucene.index.SegmentReader.document(
> > SegmentReader.java
> > > :281)
> > >         - waiting to lock <0x0000002adf5b2110> (a
> > > org.apache.lucene.index.SegmentReader)
> > >         at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java
> > :83)
> > >         at org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java
> > > :146)
> > >         at org.apache.lucene.search.Hits.doc(Hits.java:103)
> > >
> > > SegmentReader.document is a synchronized method. I have one stored field
> > > (binary, uncompressed) with and average length of 0.5Kb. The retrieval
> > of
> > > this stored field is within this synchronized code. Since I am using
> > > MMapDirectory, does this retrieval need to be synchronized?
> >
> > Yes, since in FieldReader the file positions must be synchronized.
> >
> > The way to avoid this would be to:
> >
> > 1. Add a clone() method to FieldReader that clones it's two IndexInputs.
> > 2. Add a ThreadLocal to SegmentReader whose value is a cloned FieldReader.
> > 3. Use the ThreadLocal's FieldReader in the document() method.
> >
> > TermInfosReader has a similar optimization, using a ThreadLocal
> > containing a SegmentTermEnum for each thread.
> >
> > Doug
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Throughput doesn't increase when using more concurrent threads

Reply via email to