Re: Throughput doesn't increase when using more concurrent threads

Peter Keegan Mon, 13 Mar 2006 06:20:56 -0800

Chris,

Should this patch work against the current code base? I'm getting this
error:


D:\lucene-1.9>patch -b -p0 -i nio-lucene-1.9.patch
patching file src/java/org/apache/lucene/index/CompoundFileReader.java
patching file src/java/org/apache/lucene/index/FieldsReader.java
missing header for unified diff at line 45 of patch
can't find file to patch at input line 45
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
| +47,9 @@
|     fieldsStream = d.openInput(segment + ".fdt");
|     indexStream = d.openInput(segment + ".fdx");
|
|+    fstream = new ThreadStream(fieldsStream);
|+    istream = new ThreadStream(indexStream);
|+
|     size = (int)(indexStream.length() / 8);
|   }
|
--------------------------

Thanks,
Peter


On 3/10/06, Chris Lamprecht <[EMAIL PROTECTED]> wrote:
>
> Peter,
>
> I think this is similar to the patch in this bugzilla task:
>
> http://issues.apache.org/bugzilla/show_bug.cgi?id=35838
> the patch itself is
> http://issues.apache.org/bugzilla/attachment.cgi?id=15757
>
> (BTW does JIRA have a way to display the patch diffs?)
>
> The above patch also has a change to SegmentReader to avoid
> synchronization on isDeleted().  However, with that patch, you no
> longer have the guarantee that one thread will immediately see
> deletions by another thread.  This was fine for my purposes, and
> resulted in a big performance boost when there were deleted documents,
> but it may not be "correct" for others' needs.
>
> -chris
> On 3/10/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> > > 3. Use the ThreadLocal's FieldReader in the document() method.
> >
> > As I understand it, this means that the document method no longer needs
> to
> > be synchronized, right?
> >
> > I've made these changes and it does appear to improve performance.
> Random
> > snapshots of the stack traces show only an occasional lock in
> 'isDeleted'.
> > Mostly, though, the threads are busy scoring and adding results to
> priority
> > queues, which is great. I've included some sample stacks, below. I'll
> report
> > the new query rates after it has run for at least overnight, and I'd be
> > happy submit these changes to the lucene committers, if interested.
> >
> > Peter
> >
> >
> > Sample stack traces:
> >
> > "QueryThread group 1,#8" prio=1 tid=0x0000002ce48eeb80 nid=0x6b87
> runnable
> > [0x0000000043887000..0x0000000043887bb0]
> >     at org.apache.lucene.search.FieldSortedHitQueue.lessThan(
> > FieldSortedHitQueue.java:108)
> >     at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java
> :61)
> >     at org.apache.lucene.search.FieldSortedHitQueue.insert(
> > FieldSortedHitQueue.java:85)
> >     at org.apache.lucene.search.FieldSortedHitQueue.insert(
> > FieldSortedHitQueue.java:92)
> >     at org.apache.lucene.search.TopFieldDocCollector.collect(
> > TopFieldDocCollector.java:51)
> >     at org.apache.lucene.search.TermScorer.score(TermScorer.java:75)
> >     at org.apache.lucene.search.TermScorer.score(TermScorer.java:60)
> >     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java
> :132)
> >     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java
> :110)
> >     at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java
> :225)
> >     at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
> >     at org.apache.lucene.search.Hits.<init>(Hits.java:52)
> >     at org.apache.lucene.search.Searcher.search(Searcher.java:62)
> >
> > "QueryThread group 1,#5" prio=1 tid=0x0000002ce4d659f0 nid=0x6b84
> runnable
> > [0x0000000043584000..0x0000000043584d30]
> >     at org.apache.lucene.search.TermScorer.score(TermScorer.java:75)
> >     at org.apache.lucene.search.TermScorer.score(TermScorer.java:60)
> >     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java
> :132)
> >     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java
> :110)
> >     at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java
> :225)
> >     at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
> >     at org.apache.lucene.search.Hits.<init>(Hits.java:52)
> >     at org.apache.lucene.search.Searcher.search(Searcher.java:62)
> >
> > "QueryThread group 1,#4" prio=1 tid=0x0000002ce10afd50 nid=0x6b83
> runnable
> > [0x0000000043483000..0x0000000043483db0]
> >     at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte(
> > MMapDirectory.java:46)
> >     at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:56)
> >     at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java
> > :101)
> >     at org.apache.lucene.index.SegmentTermDocs.skipTo(
> SegmentTermDocs.java
> > :194)
> >     at org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:144)
> >     at org.apache.lucene.search.ConjunctionScorer.doNext(
> > ConjunctionScorer.java:56)
> >     at org.apache.lucene.search.ConjunctionScorer.next(
> > ConjunctionScorer.java:51)
> >     at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
> > :290)
> >     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java
> :132)
> >     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java
> :110)
> >     at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java
> :225)
> >     at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
> >     at org.apache.lucene.search.Hits.<init>(Hits.java:52)
> >     at org.apache.lucene.search.Searcher.search(Searcher.java:62)
> >
> > "QueryThread group 1,#3" prio=1 tid=0x0000002ce48959f0 nid=0x6b82
> runnable
> > [0x0000000043382000..0x0000000043382e30]
> >     at java.util.LinkedList.listIterator(LinkedList.java:523)
> >     at java.util.AbstractList.listIterator(AbstractList.java:349)
> >     at java.util.AbstractSequentialList.iterator(
> AbstractSequentialList.java
> > :250)
> >     at org.apache.lucene.search.ConjunctionScorer.score(
> > ConjunctionScorer.java:80)
> >     at org.apache.lucene.search.BooleanScorer2$2.score(
> BooleanScorer2.java
> > :186)
> >     at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
> > :327)
> >     at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
> > :291)
> >     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java
> :132)
> >     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java
> :110)
> >     at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java
> :225)
> >     at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
> >     at org.apache.lucene.search.Hits.<init>(Hits.java:52)
> >     at org.apache.lucene.search.Searcher.search(Searcher.java:62)
> >
> >
> > On 3/7/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
> > >
> > > Peter Keegan wrote:
> > > > I ran a query performance tester against 8-cpu and 16-cpu Xeon
> servers
> > > > (16/32 cpu hyperthreaded). on Linux. Here are the results:
> > > >
> > > > 8-cpu:  275 qps
> > > > 16-cpu: 305 qps
> > > > (the dual-core Opteron servers are still faster)
> > > >
> > > > Here is the stack trace of 8 of the 16 query threads during the
> test:
> > > >
> > > >         at org.apache.lucene.index.SegmentReader.document(
> > > SegmentReader.java
> > > > :281)
> > > >         - waiting to lock <0x0000002adf5b2110> (a
> > > > org.apache.lucene.index.SegmentReader)
> > > >         at org.apache.lucene.search.IndexSearcher.doc(
> IndexSearcher.java
> > > :83)
> > > >         at org.apache.lucene.search.MultiSearcher.doc(
> MultiSearcher.java
> > > > :146)
> > > >         at org.apache.lucene.search.Hits.doc(Hits.java:103)
> > > >
> > > > SegmentReader.document is a synchronized method. I have one stored
> field
> > > > (binary, uncompressed) with and average length of 0.5Kb. The
> retrieval
> > > of
> > > > this stored field is within this synchronized code. Since I am using
> > > > MMapDirectory, does this retrieval need to be synchronized?
> > >
> > > Yes, since in FieldReader the file positions must be synchronized.
> > >
> > > The way to avoid this would be to:
> > >
> > > 1. Add a clone() method to FieldReader that clones it's two
> IndexInputs.
> > > 2. Add a ThreadLocal to SegmentReader whose value is a cloned
> FieldReader.
> > > 3. Use the ThreadLocal's FieldReader in the document() method.
> > >
> > > TermInfosReader has a similar optimization, using a ThreadLocal
> > > containing a SegmentTermEnum for each thread.
> > >
> > > Doug
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Throughput doesn't increase when using more concurrent threads

Reply via email to