Re: Throughput doesn't increase when using more concurrent threads
Out of interest, does indexing time speed up much on 64-bit hardware? I was able to speed up indexing on 64-bit platform by taking advantage of the larger address space to parallelize the indexing process. One thread creates index segments with a set of RAMDirectories and another thread merges the segments to disk with 'addIndexes'. This resulted in a speed improvement of 27%. Peter On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does indexing time speed up much on 64-bit hardware? I'm particularly interested in this side of things because for our own application, any query response under half a second is good enough, but the indexing side could always be faster. :-) Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
I did some additional testing with Chris's patch and mine (based on Doug's note) vs. no patch and found that all 3 produced the same throughput - about 330 qps - over a longer period. So, there seems to be a point of diminishing returns to adding more cpus. The dual core Opterons (8 cpu) still win handily at 400 qps. Peter On 3/13/06, Peter Keegan [EMAIL PROTECTED] wrote: Chris, My apologies - this error was apparently caused by a file format mismatch (probably line endings). Thanks, Peter On 3/13/06, Peter Keegan [EMAIL PROTECTED] wrote: Chris, Should this patch work against the current code base? I'm getting this error: D:\lucene-1.9patch -b -p0 -i nio-lucene-1.9.patch patching file src/java/org/apache/lucene/index/CompoundFileReader.java patching file src/java/org/apache/lucene/index/FieldsReader.java missing header for unified diff at line 45 of patch can't find file to patch at input line 45 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- | +47,9 @@ | fieldsStream = d.openInput(segment + .fdt); | indexStream = d.openInput(segment + .fdx); | |+fstream = new ThreadStream(fieldsStream); |+istream = new ThreadStream(indexStream); |+ | size = (int)(indexStream.length() / 8); | } | -- Thanks, Peter On 3/10/06, Chris Lamprecht [EMAIL PROTECTED] wrote: Peter, I think this is similar to the patch in this bugzilla task: http://issues.apache.org/bugzilla/show_bug.cgi?id=35838 the patch itself is http://issues.apache.org/bugzilla/attachment.cgi?id=15757 (BTW does JIRA have a way to display the patch diffs?) The above patch also has a change to SegmentReader to avoid synchronization on isDeleted(). However, with that patch, you no longer have the guarantee that one thread will immediately see deletions by another thread. This was fine for my purposes, and resulted in a big performance boost when there were deleted documents, but it may not be correct for others' needs. -chris On 3/10/06, Peter Keegan [EMAIL PROTECTED] wrote: 3. Use the ThreadLocal's FieldReader in the document() method. As I understand it, this means that the document method no longer needs to be synchronized, right? I've made these changes and it does appear to improve performance. Random snapshots of the stack traces show only an occasional lock in 'isDeleted'. Mostly, though, the threads are busy scoring and adding results to priority queues, which is great. I've included some sample stacks, below. I'll report the new query rates after it has run for at least overnight, and I'd be happy submit these changes to the lucene committers, if interested. Peter Sample stack traces: QueryThread group 1,#8 prio=1 tid=0x002ce48eeb80 nid=0x6b87 runnable [0x43887000..0x43887bb0] at org.apache.lucene.search.FieldSortedHitQueue.lessThan( FieldSortedHitQueue.java:108) at org.apache.lucene.util.PriorityQueue.insert( PriorityQueue.java :61) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:85) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:92) at org.apache.lucene.search.TopFieldDocCollector.collect( TopFieldDocCollector.java:51) at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) at org.apache.lucene.search.TermScorer.score (TermScorer.java :60) at org.apache.lucene.search.IndexSearcher.search( IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search( IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search ( MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search (Searcher.java:62) QueryThread group 1,#5 prio=1 tid=0x002ce4d659f0 nid=0x6b84 runnable [0x43584000..0x43584d30] at org.apache.lucene.search.TermScorer.score (TermScorer.java :75) at org.apache.lucene.search.TermScorer.score(TermScorer.java:60) at org.apache.lucene.search.IndexSearcher.search( IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search ( IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search( MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits .init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#4 prio=1 tid=0x002ce10afd50 nid=0x6b83 runnable [0x43483000..0x43483db0] at
Re: Throughput doesn't increase when using more concurrent threads
Peter Keegan wrote: I did some additional testing with Chris's patch and mine (based on Doug's note) vs. no patch and found that all 3 produced the same throughput - about 330 qps - over a longer period. Was CPU utilizaton 100%? If not, where do you think the bottleneck now is? Network? Or some other Java monitor contention? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Chris, Should this patch work against the current code base? I'm getting this error: D:\lucene-1.9patch -b -p0 -i nio-lucene-1.9.patch patching file src/java/org/apache/lucene/index/CompoundFileReader.java patching file src/java/org/apache/lucene/index/FieldsReader.java missing header for unified diff at line 45 of patch can't find file to patch at input line 45 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- | +47,9 @@ | fieldsStream = d.openInput(segment + .fdt); | indexStream = d.openInput(segment + .fdx); | |+fstream = new ThreadStream(fieldsStream); |+istream = new ThreadStream(indexStream); |+ | size = (int)(indexStream.length() / 8); | } | -- Thanks, Peter On 3/10/06, Chris Lamprecht [EMAIL PROTECTED] wrote: Peter, I think this is similar to the patch in this bugzilla task: http://issues.apache.org/bugzilla/show_bug.cgi?id=35838 the patch itself is http://issues.apache.org/bugzilla/attachment.cgi?id=15757 (BTW does JIRA have a way to display the patch diffs?) The above patch also has a change to SegmentReader to avoid synchronization on isDeleted(). However, with that patch, you no longer have the guarantee that one thread will immediately see deletions by another thread. This was fine for my purposes, and resulted in a big performance boost when there were deleted documents, but it may not be correct for others' needs. -chris On 3/10/06, Peter Keegan [EMAIL PROTECTED] wrote: 3. Use the ThreadLocal's FieldReader in the document() method. As I understand it, this means that the document method no longer needs to be synchronized, right? I've made these changes and it does appear to improve performance. Random snapshots of the stack traces show only an occasional lock in 'isDeleted'. Mostly, though, the threads are busy scoring and adding results to priority queues, which is great. I've included some sample stacks, below. I'll report the new query rates after it has run for at least overnight, and I'd be happy submit these changes to the lucene committers, if interested. Peter Sample stack traces: QueryThread group 1,#8 prio=1 tid=0x002ce48eeb80 nid=0x6b87 runnable [0x43887000..0x43887bb0] at org.apache.lucene.search.FieldSortedHitQueue.lessThan( FieldSortedHitQueue.java:108) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java :61) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:85) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:92) at org.apache.lucene.search.TopFieldDocCollector.collect( TopFieldDocCollector.java:51) at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) at org.apache.lucene.search.TermScorer.score(TermScorer.java:60) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java :132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java :110) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java :225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#5 prio=1 tid=0x002ce4d659f0 nid=0x6b84 runnable [0x43584000..0x43584d30] at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) at org.apache.lucene.search.TermScorer.score(TermScorer.java:60) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java :132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java :110) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java :225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#4 prio=1 tid=0x002ce10afd50 nid=0x6b83 runnable [0x43483000..0x43483db0] at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte( MMapDirectory.java:46) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:56) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java :101) at org.apache.lucene.index.SegmentTermDocs.skipTo( SegmentTermDocs.java :194) at org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:144) at org.apache.lucene.search.ConjunctionScorer.doNext( ConjunctionScorer.java:56) at org.apache.lucene.search.ConjunctionScorer.next( ConjunctionScorer.java:51) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :290) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java :132) at
Re: Throughput doesn't increase when using more concurrent threads
Chris, My apologies - this error was apparently caused by a file format mismatch (probably line endings). Thanks, Peter On 3/13/06, Peter Keegan [EMAIL PROTECTED] wrote: Chris, Should this patch work against the current code base? I'm getting this error: D:\lucene-1.9patch -b -p0 -i nio-lucene-1.9.patch patching file src/java/org/apache/lucene/index/CompoundFileReader.java patching file src/java/org/apache/lucene/index/FieldsReader.java missing header for unified diff at line 45 of patch can't find file to patch at input line 45 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- | +47,9 @@ | fieldsStream = d.openInput(segment + .fdt); | indexStream = d.openInput(segment + .fdx); | |+fstream = new ThreadStream(fieldsStream); |+istream = new ThreadStream(indexStream); |+ | size = (int)(indexStream.length() / 8); | } | -- Thanks, Peter On 3/10/06, Chris Lamprecht [EMAIL PROTECTED] wrote: Peter, I think this is similar to the patch in this bugzilla task: http://issues.apache.org/bugzilla/show_bug.cgi?id=35838 the patch itself is http://issues.apache.org/bugzilla/attachment.cgi?id=15757 (BTW does JIRA have a way to display the patch diffs?) The above patch also has a change to SegmentReader to avoid synchronization on isDeleted(). However, with that patch, you no longer have the guarantee that one thread will immediately see deletions by another thread. This was fine for my purposes, and resulted in a big performance boost when there were deleted documents, but it may not be correct for others' needs. -chris On 3/10/06, Peter Keegan [EMAIL PROTECTED] wrote: 3. Use the ThreadLocal's FieldReader in the document() method. As I understand it, this means that the document method no longer needs to be synchronized, right? I've made these changes and it does appear to improve performance. Random snapshots of the stack traces show only an occasional lock in 'isDeleted'. Mostly, though, the threads are busy scoring and adding results to priority queues, which is great. I've included some sample stacks, below. I'll report the new query rates after it has run for at least overnight, and I'd be happy submit these changes to the lucene committers, if interested. Peter Sample stack traces: QueryThread group 1,#8 prio=1 tid=0x002ce48eeb80 nid=0x6b87 runnable [0x43887000..0x43887bb0] at org.apache.lucene.search.FieldSortedHitQueue.lessThan( FieldSortedHitQueue.java:108) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:61) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:85) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:92) at org.apache.lucene.search.TopFieldDocCollector.collect( TopFieldDocCollector.java:51) at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) at org.apache.lucene.search.TermScorer.score (TermScorer.java:60) at org.apache.lucene.search.IndexSearcher.search( IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search( IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search ( MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search (Searcher.java:62) QueryThread group 1,#5 prio=1 tid=0x002ce4d659f0 nid=0x6b84 runnable [0x43584000..0x43584d30] at org.apache.lucene.search.TermScorer.score (TermScorer.java:75) at org.apache.lucene.search.TermScorer.score(TermScorer.java:60) at org.apache.lucene.search.IndexSearcher.search( IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search ( IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search( MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits .init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#4 prio=1 tid=0x002ce10afd50 nid=0x6b83 runnable [0x43483000..0x43483db0] at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte( MMapDirectory.java:46) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:56) at org.apache.lucene.index.SegmentTermDocs.next ( SegmentTermDocs.java :101) at org.apache.lucene.index.SegmentTermDocs.skipTo( SegmentTermDocs.java :194) at org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:144) at org.apache.lucene.search.ConjunctionScorer.doNext( ConjunctionScorer.java:56) at
Re: Throughput doesn't increase when using more concurrent threads
3. Use the ThreadLocal's FieldReader in the document() method. As I understand it, this means that the document method no longer needs to be synchronized, right? I've made these changes and it does appear to improve performance. Random snapshots of the stack traces show only an occasional lock in 'isDeleted'. Mostly, though, the threads are busy scoring and adding results to priority queues, which is great. I've included some sample stacks, below. I'll report the new query rates after it has run for at least overnight, and I'd be happy submit these changes to the lucene committers, if interested. Peter Sample stack traces: QueryThread group 1,#8 prio=1 tid=0x002ce48eeb80 nid=0x6b87 runnable [0x43887000..0x43887bb0] at org.apache.lucene.search.FieldSortedHitQueue.lessThan( FieldSortedHitQueue.java:108) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:61) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:85) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:92) at org.apache.lucene.search.TopFieldDocCollector.collect( TopFieldDocCollector.java:51) at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) at org.apache.lucene.search.TermScorer.score(TermScorer.java:60) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#5 prio=1 tid=0x002ce4d659f0 nid=0x6b84 runnable [0x43584000..0x43584d30] at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) at org.apache.lucene.search.TermScorer.score(TermScorer.java:60) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#4 prio=1 tid=0x002ce10afd50 nid=0x6b83 runnable [0x43483000..0x43483db0] at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte( MMapDirectory.java:46) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:56) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java :101) at org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java :194) at org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:144) at org.apache.lucene.search.ConjunctionScorer.doNext( ConjunctionScorer.java:56) at org.apache.lucene.search.ConjunctionScorer.next( ConjunctionScorer.java:51) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :290) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#3 prio=1 tid=0x002ce48959f0 nid=0x6b82 runnable [0x43382000..0x43382e30] at java.util.LinkedList.listIterator(LinkedList.java:523) at java.util.AbstractList.listIterator(AbstractList.java:349) at java.util.AbstractSequentialList.iterator(AbstractSequentialList.java :250) at org.apache.lucene.search.ConjunctionScorer.score( ConjunctionScorer.java:80) at org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java :186) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :327) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :291) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) On 3/7/06, Doug Cutting [EMAIL PROTECTED] wrote: Peter Keegan wrote: I ran a query performance tester against 8-cpu and 16-cpu Xeon servers (16/32 cpu hyperthreaded). on Linux. Here are the results: 8-cpu: 275 qps 16-cpu: 305 qps (the dual-core Opteron servers are still faster) Here is the stack trace of 8 of the 16 query
Re: Throughput doesn't increase when using more concurrent threads
Peter, I think this is similar to the patch in this bugzilla task: http://issues.apache.org/bugzilla/show_bug.cgi?id=35838 the patch itself is http://issues.apache.org/bugzilla/attachment.cgi?id=15757 (BTW does JIRA have a way to display the patch diffs?) The above patch also has a change to SegmentReader to avoid synchronization on isDeleted(). However, with that patch, you no longer have the guarantee that one thread will immediately see deletions by another thread. This was fine for my purposes, and resulted in a big performance boost when there were deleted documents, but it may not be correct for others' needs. -chris On 3/10/06, Peter Keegan [EMAIL PROTECTED] wrote: 3. Use the ThreadLocal's FieldReader in the document() method. As I understand it, this means that the document method no longer needs to be synchronized, right? I've made these changes and it does appear to improve performance. Random snapshots of the stack traces show only an occasional lock in 'isDeleted'. Mostly, though, the threads are busy scoring and adding results to priority queues, which is great. I've included some sample stacks, below. I'll report the new query rates after it has run for at least overnight, and I'd be happy submit these changes to the lucene committers, if interested. Peter Sample stack traces: QueryThread group 1,#8 prio=1 tid=0x002ce48eeb80 nid=0x6b87 runnable [0x43887000..0x43887bb0] at org.apache.lucene.search.FieldSortedHitQueue.lessThan( FieldSortedHitQueue.java:108) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:61) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:85) at org.apache.lucene.search.FieldSortedHitQueue.insert( FieldSortedHitQueue.java:92) at org.apache.lucene.search.TopFieldDocCollector.collect( TopFieldDocCollector.java:51) at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) at org.apache.lucene.search.TermScorer.score(TermScorer.java:60) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#5 prio=1 tid=0x002ce4d659f0 nid=0x6b84 runnable [0x43584000..0x43584d30] at org.apache.lucene.search.TermScorer.score(TermScorer.java:75) at org.apache.lucene.search.TermScorer.score(TermScorer.java:60) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#4 prio=1 tid=0x002ce10afd50 nid=0x6b83 runnable [0x43483000..0x43483db0] at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte( MMapDirectory.java:46) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:56) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java :101) at org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java :194) at org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:144) at org.apache.lucene.search.ConjunctionScorer.doNext( ConjunctionScorer.java:56) at org.apache.lucene.search.ConjunctionScorer.next( ConjunctionScorer.java:51) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :290) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:225) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.init(Hits.java:52) at org.apache.lucene.search.Searcher.search(Searcher.java:62) QueryThread group 1,#3 prio=1 tid=0x002ce48959f0 nid=0x6b82 runnable [0x43382000..0x43382e30] at java.util.LinkedList.listIterator(LinkedList.java:523) at java.util.AbstractList.listIterator(AbstractList.java:349) at java.util.AbstractSequentialList.iterator(AbstractSequentialList.java :250) at org.apache.lucene.search.ConjunctionScorer.score( ConjunctionScorer.java:80) at org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java :186) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :327) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :291)
Re: Throughput doesn't increase when using more concurrent threads
I ran a query performance tester against 8-cpu and 16-cpu Xeon servers (16/32 cpu hyperthreaded). on Linux. Here are the results: 8-cpu: 275 qps 16-cpu: 305 qps (the dual-core Opteron servers are still faster) Here is the stack trace of 8 of the 16 query threads during the test: at org.apache.lucene.index.SegmentReader.document(SegmentReader.java :281) - waiting to lock 0x002adf5b2110 (a org.apache.lucene.index.SegmentReader) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:83) at org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java :146) at org.apache.lucene.search.Hits.doc(Hits.java:103) SegmentReader.document is a synchronized method. I have one stored field (binary, uncompressed) with and average length of 0.5Kb. The retrieval of this stored field is within this synchronized code. Since I am using MMapDirectory, does this retrieval need to be synchronized? Peter On 2/23/06, Peter Keegan [EMAIL PROTECTED] wrote: Yonik, We're investigating both approaches. Yes, the resources (and permutations) are dizzying! Peter On 2/23/06, Yonik Seeley [EMAIL PROTECTED] wrote: Wow, some resources! Would it be cheaper / more scalable to copy the index to multiple boxes and loadbalance requests across them? -Yonik On 2/23/06, Peter Keegan [EMAIL PROTECTED] wrote: Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next (32 with hyperthreading), on LinTel. I may give JRockit another go around then. Thanks, Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Peter Keegan wrote: I ran a query performance tester against 8-cpu and 16-cpu Xeon servers (16/32 cpu hyperthreaded). on Linux. Here are the results: 8-cpu: 275 qps 16-cpu: 305 qps (the dual-core Opteron servers are still faster) Here is the stack trace of 8 of the 16 query threads during the test: at org.apache.lucene.index.SegmentReader.document(SegmentReader.java :281) - waiting to lock 0x002adf5b2110 (a org.apache.lucene.index.SegmentReader) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:83) at org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java :146) at org.apache.lucene.search.Hits.doc(Hits.java:103) SegmentReader.document is a synchronized method. I have one stored field (binary, uncompressed) with and average length of 0.5Kb. The retrieval of this stored field is within this synchronized code. Since I am using MMapDirectory, does this retrieval need to be synchronized? Yes, since in FieldReader the file positions must be synchronized. The way to avoid this would be to: 1. Add a clone() method to FieldReader that clones it's two IndexInputs. 2. Add a ThreadLocal to SegmentReader whose value is a cloned FieldReader. 3. Use the ThreadLocal's FieldReader in the document() method. TermInfosReader has a similar optimization, using a ThreadLocal containing a SegmentTermEnum for each thread. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Can nutch be made to use lucene query parser? Rgds Prabhu On 2/23/06, Peter Keegan [EMAIL PROTECTED] wrote: Hi Otis, The Lucene server is actually CPU and network bound, as the index gets memory mapped pretty quickly. There is little disk activity observed. I was also able to run the server on a Sun box last night with 4 dual core opterons (same Linux and JVM) and I'm observing query rates of 400 qps! Has Linux been optimized to run on this hardware? I imagine that Sun's JVM has been. Peter On 2/22/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) --- I would also play with disk IO schedulers, if you can. CentOS is based on RedHat, I believe, and RedHat (ext3, really) now has about 4 different IO schedulers that, according to articles I recently read, can have an impact on disk read/write performance. These schedules can be specified at mount time, I believe, and maybe at boot time (kernel line in Grub/LILO). Otis On 2/22/06, Peter Keegan [EMAIL PROTECTED] wrote: I am doing a performance comparison of Lucene on Linux vs Windows. I have 2 identically configured servers (8-CPUs (real) x 3GHz Xeon processors, 64GB RAM). One is running CentOS 4 Linux, the other is running Windows server 2003 Enterprise Edition x64. Both have 64-bit JVMs from Sun. The Lucene server is using MMapDirectory. I'm running the jvm with -Xmx16000M. Peak memory usage of the jvm on Linux is about 6GB and 7.8GBon windows. I'm observing query rates of 330 queries/sec on the Wintel server, but only 200 qps on the Linux box. At first, I suspected a network bottleneck, but when I 'short-circuited' Lucene, the query rates were identical. I suspect that there are some things to be tuned in Linux, but I'm not sure what. Any advice would be appreciated. Peter On 1/30/06, Peter Keegan [EMAIL PROTECTED] wrote: I cranked up the dial on my query tester and was able to get the rate up to 325 qps. Unfortunately, the machine died shortly thereafter (memory errors :-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit indexing speed, yet. Peter On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does indexing time speed up much on 64-bit hardware? I'm particularly interested in this side of things because for our own application, any query response under half a second is good enough, but the indexing side could always be faster. :-) Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
I would give the IBM or blackdown JVM a try on linux - I've seen pretty wide variance in their speed on different operations. Sometimes better than Sun, sometimes worse - it depended on the task (I did some adhoc tests at one point that showed sun was faster for indexing, but IBM was faster for querying - but that was quite a while ago. Dan -- Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Hi, Please ask on the Nutch mailing list (I answered your question in general@ already). Also, please don't steal other people's threads - it's considered inpolite for obvious reasons. Otis - Original Message From: Raghavendra Prabhu [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Thursday, February 23, 2006 11:10:11 AM Subject: Re: Throughput doesn't increase when using more concurrent threads Can nutch be made to use lucene query parser? Rgds Prabhu On 2/23/06, Peter Keegan [EMAIL PROTECTED] wrote: Hi Otis, The Lucene server is actually CPU and network bound, as the index gets memory mapped pretty quickly. There is little disk activity observed. I was also able to run the server on a Sun box last night with 4 dual core opterons (same Linux and JVM) and I'm observing query rates of 400 qps! Has Linux been optimized to run on this hardware? I imagine that Sun's JVM has been. Peter On 2/22/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) --- I would also play with disk IO schedulers, if you can. CentOS is based on RedHat, I believe, and RedHat (ext3, really) now has about 4 different IO schedulers that, according to articles I recently read, can have an impact on disk read/write performance. These schedules can be specified at mount time, I believe, and maybe at boot time (kernel line in Grub/LILO). Otis On 2/22/06, Peter Keegan [EMAIL PROTECTED] wrote: I am doing a performance comparison of Lucene on Linux vs Windows. I have 2 identically configured servers (8-CPUs (real) x 3GHz Xeon processors, 64GB RAM). One is running CentOS 4 Linux, the other is running Windows server 2003 Enterprise Edition x64. Both have 64-bit JVMs from Sun. The Lucene server is using MMapDirectory. I'm running the jvm with -Xmx16000M. Peak memory usage of the jvm on Linux is about 6GB and 7.8GBon windows. I'm observing query rates of 330 queries/sec on the Wintel server, but only 200 qps on the Linux box. At first, I suspected a network bottleneck, but when I 'short-circuited' Lucene, the query rates were identical. I suspect that there are some things to be tuned in Linux, but I'm not sure what. Any advice would be appreciated. Peter On 1/30/06, Peter Keegan [EMAIL PROTECTED] wrote: I cranked up the dial on my query tester and was able to get the rate up to 325 qps. Unfortunately, the machine died shortly thereafter (memory errors :-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit indexing speed, yet. Peter On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does indexing time speed up much on 64-bit hardware? I'm particularly interested in this side of things because for our own application, any query response under half a second is good enough, but the indexing side could always be faster. :-) Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Hi Sorry for the trouble I was sending my first mail to the group and replied to this thread and then later on sent a direct mail. I would like to apologise for the inconvenience caused. Rgds Prabhu On 2/23/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, Please ask on the Nutch mailing list (I answered your question in general@ already). Also, please don't steal other people's threads - it's considered inpolite for obvious reasons. Otis - Original Message From: Raghavendra Prabhu [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Thursday, February 23, 2006 11:10:11 AM Subject: Re: Throughput doesn't increase when using more concurrent threads Can nutch be made to use lucene query parser? Rgds Prabhu On 2/23/06, Peter Keegan [EMAIL PROTECTED] wrote: Hi Otis, The Lucene server is actually CPU and network bound, as the index gets memory mapped pretty quickly. There is little disk activity observed. I was also able to run the server on a Sun box last night with 4 dual core opterons (same Linux and JVM) and I'm observing query rates of 400 qps! Has Linux been optimized to run on this hardware? I imagine that Sun's JVM has been. Peter On 2/22/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) --- I would also play with disk IO schedulers, if you can. CentOS is based on RedHat, I believe, and RedHat (ext3, really) now has about 4 different IO schedulers that, according to articles I recently read, can have an impact on disk read/write performance. These schedules can be specified at mount time, I believe, and maybe at boot time (kernel line in Grub/LILO). Otis On 2/22/06, Peter Keegan [EMAIL PROTECTED] wrote: I am doing a performance comparison of Lucene on Linux vs Windows. I have 2 identically configured servers (8-CPUs (real) x 3GHz Xeon processors, 64GB RAM). One is running CentOS 4 Linux, the other is running Windows server 2003 Enterprise Edition x64. Both have 64-bit JVMs from Sun. The Lucene server is using MMapDirectory. I'm running the jvm with -Xmx16000M. Peak memory usage of the jvm on Linux is about 6GB and 7.8GBon windows. I'm observing query rates of 330 queries/sec on the Wintel server, but only 200 qps on the Linux box. At first, I suspected a network bottleneck, but when I 'short-circuited' Lucene, the query rates were identical. I suspect that there are some things to be tuned in Linux, but I'm not sure what. Any advice would be appreciated. Peter On 1/30/06, Peter Keegan [EMAIL PROTECTED] wrote: I cranked up the dial on my query tester and was able to get the rate up to 325 qps. Unfortunately, the machine died shortly thereafter (memory errors :-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit indexing speed, yet. Peter On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does indexing time speed up much on 64-bit hardware? I'm particularly interested in this side of things because for our own application, any query response under half a second is good enough, but the indexing side could always be faster. :-) Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED
Re: Throughput doesn't increase when using more concurrent threads
We discovered that the kernel was only using 8 CPUs. After recompiling for 16 (8+hyperthreads), it looks like the query rate will settle in around 280-300 qps. Much better, although still quite a bit slower than the opteron. Peter On 2/22/06, Yonik Seeley [EMAIL PROTECTED] wrote: Hmmm, not sure what that could be. You could try using the default FSDir instead of MMapDir to see if the differences are there. Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) - synchronization workings - page replacement policy... how to figure out what pages to swap in and which to swap out, esp of the memory mapped files. You could also try a profiler on both platforms to try and see where the difference is. -Yonik On 2/22/06, Peter Keegan [EMAIL PROTECTED] wrote: I am doing a performance comparison of Lucene on Linux vs Windows. I have 2 identically configured servers (8-CPUs (real) x 3GHz Xeon processors, 64GB RAM). One is running CentOS 4 Linux, the other is running Windows server 2003 Enterprise Edition x64. Both have 64-bit JVMs from Sun. The Lucene server is using MMapDirectory. I'm running the jvm with -Xmx16000M. Peak memory usage of the jvm on Linux is about 6GB and 7.8GBon windows. I'm observing query rates of 330 queries/sec on the Wintel server, but only 200 qps on the Linux box. At first, I suspected a network bottleneck, but when I 'short-circuited' Lucene, the query rates were identical. I suspect that there are some things to be tuned in Linux, but I'm not sure what. Any advice would be appreciated. Peter On 1/30/06, Peter Keegan [EMAIL PROTECTED] wrote: I cranked up the dial on my query tester and was able to get the rate up to 325 qps. Unfortunately, the machine died shortly thereafter (memory errors :-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit indexing speed, yet. Peter On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does indexing time speed up much on 64-bit hardware? I'm particularly interested in this side of things because for our own application, any query response under half a second is good enough, but the indexing side could always be faster. :-) Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Chris, I tried JRockit a while back on 8-cpu/windows and it was slower than Sun's. Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next (32 with hyperthreading), on LinTel. I may give JRockit another go around then. Thanks, Peter On 2/23/06, Chris Lamprecht [EMAIL PROTECTED] wrote: Peter, Have you given JRockit JVM a try? I've seen it help throughput compared to Sun's JVM on a dual xeon/linux machine, especially with concurrency (up to 6 concurrent searches happening). I'm curious to see if it makes a difference for you. -chris On 2/23/06, Peter Keegan [EMAIL PROTECTED] wrote: We discovered that the kernel was only using 8 CPUs. After recompiling for 16 (8+hyperthreads), it looks like the query rate will settle in around 280-300 qps. Much better, although still quite a bit slower than the opteron. Peter On 2/22/06, Yonik Seeley [EMAIL PROTECTED] wrote: Hmmm, not sure what that could be. You could try using the default FSDir instead of MMapDir to see if the differences are there. Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) - synchronization workings - page replacement policy... how to figure out what pages to swap in and which to swap out, esp of the memory mapped files. You could also try a profiler on both platforms to try and see where the difference is. -Yonik On 2/22/06, Peter Keegan [EMAIL PROTECTED] wrote: I am doing a performance comparison of Lucene on Linux vs Windows. I have 2 identically configured servers (8-CPUs (real) x 3GHz Xeon processors, 64GB RAM). One is running CentOS 4 Linux, the other is running Windows server 2003 Enterprise Edition x64. Both have 64-bit JVMs from Sun. The Lucene server is using MMapDirectory. I'm running the jvm with -Xmx16000M. Peak memory usage of the jvm on Linux is about 6GB and 7.8GBon windows. I'm observing query rates of 330 queries/sec on the Wintel server, but only 200 qps on the Linux box. At first, I suspected a network bottleneck, but when I 'short-circuited' Lucene, the query rates were identical. I suspect that there are some things to be tuned in Linux, but I'm not sure what. Any advice would be appreciated. Peter On 1/30/06, Peter Keegan [EMAIL PROTECTED] wrote: I cranked up the dial on my query tester and was able to get the rate up to 325 qps. Unfortunately, the machine died shortly thereafter (memory errors :-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit indexing speed, yet. Peter On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does indexing time speed up much on 64-bit hardware? I'm particularly interested in this side of things because for our own application, any query response under half a second is good enough, but the indexing side could always be faster. :-) Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Wow, some resources! Would it be cheaper / more scalable to copy the index to multiple boxes and loadbalance requests across them? -Yonik On 2/23/06, Peter Keegan [EMAIL PROTECTED] wrote: Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next (32 with hyperthreading), on LinTel. I may give JRockit another go around then. Thanks, Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Yonik, We're investigating both approaches. Yes, the resources (and permutations) are dizzying! Peter On 2/23/06, Yonik Seeley [EMAIL PROTECTED] wrote: Wow, some resources! Would it be cheaper / more scalable to copy the index to multiple boxes and loadbalance requests across them? -Yonik On 2/23/06, Peter Keegan [EMAIL PROTECTED] wrote: Since I seem to be cpu-bound right now, I'll be trying a 16-cpu system next (32 with hyperthreading), on LinTel. I may give JRockit another go around then. Thanks, Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Hmmm, not sure what that could be. You could try using the default FSDir instead of MMapDir to see if the differences are there. Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) - synchronization workings - page replacement policy... how to figure out what pages to swap in and which to swap out, esp of the memory mapped files. You could also try a profiler on both platforms to try and see where the difference is. -Yonik On 2/22/06, Peter Keegan [EMAIL PROTECTED] wrote: I am doing a performance comparison of Lucene on Linux vs Windows. I have 2 identically configured servers (8-CPUs (real) x 3GHz Xeon processors, 64GB RAM). One is running CentOS 4 Linux, the other is running Windows server 2003 Enterprise Edition x64. Both have 64-bit JVMs from Sun. The Lucene server is using MMapDirectory. I'm running the jvm with -Xmx16000M. Peak memory usage of the jvm on Linux is about 6GB and 7.8GB on windows. I'm observing query rates of 330 queries/sec on the Wintel server, but only 200 qps on the Linux box. At first, I suspected a network bottleneck, but when I 'short-circuited' Lucene, the query rates were identical. I suspect that there are some things to be tuned in Linux, but I'm not sure what. Any advice would be appreciated. Peter On 1/30/06, Peter Keegan [EMAIL PROTECTED] wrote: I cranked up the dial on my query tester and was able to get the rate up to 325 qps. Unfortunately, the machine died shortly thereafter (memory errors :-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit indexing speed, yet. Peter On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does indexing time speed up much on 64-bit hardware? I'm particularly interested in this side of things because for our own application, any query response under half a second is good enough, but the indexing side could always be faster. :-) Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
I cranked up the dial on my query tester and was able to get the rate up to 325 qps. Unfortunately, the machine died shortly thereafter (memory errors :-( ) Hopefully, it was just a coincidence. I haven't measured 64-bit indexing speed, yet. Peter On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does indexing time speed up much on 64-bit hardware? I'm particularly interested in this side of things because for our own application, any query response under half a second is good enough, but the indexing side could always be faster. :-) Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Peter Keegan wrote: I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. Is this true about the 64-bit JVM not working on Intel? I was under the impression that it supported the AMD64 instruction set, and that Intel's 64-bit processors basically cloned AMD's instruction set. I really hope this isn't the case, because it's going to be one hell of a caveat if we end up telling customers yes, we support 64-bit AMD, but not 64-bit Intel. Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does indexing time speed up much on 64-bit hardware? I'm particularly interested in this side of things because for our own application, any query response under half a second is good enough, but the indexing side could always be faster. :-) Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. Is this true about the 64-bit JVM not working on Intel? Go back and look at my response to the message you quoted :-) The short answer is yes, it will work on Intel. I was under the impression that it supported the AMD64 instruction set, and that Intel's 64-bit processors basically cloned AMD's instruction set. Pretty much, but it's never that simple (and wasn't for 32 bit mode either) http://en.wikipedia.org/wiki/EM64T#Differences_between_AMD64_and_EM64T Now that they have both been out a while, compilers generally produce code that work on both. Tricky things like JVMs and esp kernels needed explicit support. I really hope this isn't the case, because it's going to be one hell of a caveat if we end up telling customers yes, we support 64-bit AMD, but not 64-bit Intel. Support is a different issue. It may work, but it may or may not be a supported platform of the JVM vendor. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Yonik Seeley wrote: On 1/29/06, Daniel Noll [EMAIL PROTECTED] wrote: Peter Keegan wrote: I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. Is this true about the 64-bit JVM not working on Intel? Go back and look at my response to the message you quoted :-) The short answer is yes, it will work on Intel. Ah. Okay, sorry about that. The only response I saw was the one about JRockit supporting it. Support is a different issue. It may work, but it may or may not be a supported platform of the JVM vendor. True enough. At the moment our most likely move is to support Sun's 64-bit JVM on Windows, but not other vendors' JVMs (i.e., we'll support whatever JVMs we redistribute with our own app.) Of course, this will only come once we claim to support 64-bit hardware... I'm sure there are many things still yet to be done there, such as making sure all our JNI libraries will compile properly for 64-bit Windows. Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
On Wednesday 25 January 2006 20:51, Peter Keegan wrote: The index is non-compound format and optimized. Yes, I did try MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors) Peter You could also give this a try: http://issues.apache.org/jira/browse/LUCENE-283 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Speaking of NioFSDirectory, I thought there was one posted a while ago, is this something that can be used? http://issues.apache.org/jira/browse/LUCENE-414 ray, On 11/22/05, Doug Cutting [EMAIL PROTECTED] wrote: Jay Booth wrote: I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused threads to stack up in front of this synchronized block and our CPU time wound up being spent thrashing between blocked threads instead of doing anything useful. This is correct. In Lucene, multiple streams per file are created by cloning, and all clones of an FSDirectory input stream share a RandomAccessFile and must synchronize input from it. MmapDirectory does not have this limitation. If your indexes are less than a few GB or you are using 64-bit hardware, then MmapDirectory should work well for you. Otherwise it would be simple to write an nio-based Directory that does not use mmap that is also unsynchronized. Such a contribution would be welcome. Making multiple IndexSearchers and FSDirectories didn't help because in the back end, lucene consults a singleton HashMap of some kind (don't remember implementation) that maintained a single FSDirectory for any given index being accessed from the JVM... multiple calls to FSDirectory.getDirectory actually return the same FSDirectory object with synchronization at the same point. This does not make sense to me. FSDirectory does keep a cache of FSDirectory instances, but i/o should not be synchronized on these. One should be able to open multiple input streams on the same file from an FSDirectory. But this would not be a great solution, since file handle limits would soon become a problem. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Paul, I tried this but it ran out of memory trying to read the 500Mb .fdt file. I tried various values for MAX_BBUF, but it still ran out of memory (I'm using -Xmx1600M, which is the jvm's maximum value (v1.5)) I'll give NioFSDirectory a try. Thanks, Peter On 1/26/06, Paul Elschot [EMAIL PROTECTED] wrote: On Wednesday 25 January 2006 20:51, Peter Keegan wrote: The index is non-compound format and optimized. Yes, I did try MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors) Peter You could also give this a try: http://issues.apache.org/jira/browse/LUCENE-283 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Ray, The throughput is worse with NioFSDIrectory than with the FSDIrectory (patched and unpatched). The bottleneck still seems to be synchronization, this time in NioFile.getChannel (7 of the 8 threads were blocked there during one snapshot). I tried this with 4 and 8 channels. The throughput with the patched FSDirectory was about the same as before the patch. Thanks, Peter On 1/26/06, Ray Tsang [EMAIL PROTECTED] wrote: Speaking of NioFSDirectory, I thought there was one posted a while ago, is this something that can be used? http://issues.apache.org/jira/browse/LUCENE-414 ray, On 11/22/05, Doug Cutting [EMAIL PROTECTED] wrote: Jay Booth wrote: I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused threads to stack up in front of this synchronized block and our CPU time wound up being spent thrashing between blocked threads instead of doing anything useful. This is correct. In Lucene, multiple streams per file are created by cloning, and all clones of an FSDirectory input stream share a RandomAccessFile and must synchronize input from it. MmapDirectory does not have this limitation. If your indexes are less than a few GB or you are using 64-bit hardware, then MmapDirectory should work well for you. Otherwise it would be simple to write an nio-based Directory that does not use mmap that is also unsynchronized. Such a contribution would be welcome. Making multiple IndexSearchers and FSDirectories didn't help because in the back end, lucene consults a singleton HashMap of some kind (don't remember implementation) that maintained a single FSDirectory for any given index being accessed from the JVM... multiple calls to FSDirectory.getDirectory actually return the same FSDirectory object with synchronization at the same point. This does not make sense to me. FSDirectory does keep a cache of FSDirectory instances, but i/o should not be synchronized on these. One should be able to open multiple input streams on the same file from an FSDirectory. But this would not be a great solution, since file handle limits would soon become a problem. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Hmmm, can you run the 64 bit version of Windows (and hence a 64 bit JVM?) We're running with heap sizes up to 8GB (RH Linux 64 bit, Opterons, Sun Java 1.5) -Yonik On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: Paul, I tried this but it ran out of memory trying to read the 500Mb .fdt file. I tried various values for MAX_BBUF, but it still ran out of memory (I'm using -Xmx1600M, which is the jvm's maximum value (v1.5)) I'll give NioFSDirectory a try. Thanks, Peter On 1/26/06, Paul Elschot [EMAIL PROTECTED] wrote: On Wednesday 25 January 2006 20:51, Peter Keegan wrote: The index is non-compound format and optimized. Yes, I did try MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors) Peter You could also give this a try: http://issues.apache.org/jira/browse/LUCENE-283 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. btw, I'm getting a sustained rate of 135 queries/sec with 4 threads, which is pretty impressive. Another way around the concurrency limit is to run multiple jvms. The throughput of each is less, but the aggregate throughput is higher. Peter On 1/26/06, Yonik Seeley [EMAIL PROTECTED] wrote: Hmmm, can you run the 64 bit version of Windows (and hence a 64 bit JVM?) We're running with heap sizes up to 8GB (RH Linux 64 bit, Opterons, Sun Java 1.5) -Yonik On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: Paul, I tried this but it ran out of memory trying to read the 500Mb .fdt file. I tried various values for MAX_BBUF, but it still ran out of memory (I'm using -Xmx1600M, which is the jvm's maximum value (v1.5)) I'll give NioFSDirectory a try. Thanks, Peter On 1/26/06, Paul Elschot [EMAIL PROTECTED] wrote: On Wednesday 25 January 2006 20:51, Peter Keegan wrote: The index is non-compound format and optimized. Yes, I did try MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors) Peter You could also give this a try: http://issues.apache.org/jira/browse/LUCENE-283 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
BEA Jrockit supports both AMD64 and Intel's EM64T (basically renamed AMD64) http://www.bea.com/framework.jsp?CNT=index.htmFP=/content/products/jrockit/ and Sun's Java 1.5 for Windows AMD64 Platform They advertize AMD64, presumably because that's what there servers use, but it should work on Intel's x86_64 (EM64T) also. The release notes have the following: With the release, J2SE support for Windows 64-bit has progressed from release candidate to final release. This version runs on AMD64/EM64T 64-bit mode machines with Windows Server 2003 x64 Editions. Of course, if the platform is up to you, I'd choose Linux :-) -Yonik On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. btw, I'm getting a sustained rate of 135 queries/sec with 4 threads, which is pretty impressive. Another way around the concurrency limit is to run multiple jvms. The throughput of each is less, but the aggregate throughput is higher. Peter On 1/26/06, Yonik Seeley [EMAIL PROTECTED] wrote: Hmmm, can you run the 64 bit version of Windows (and hence a 64 bit JVM?) We're running with heap sizes up to 8GB (RH Linux 64 bit, Opterons, Sun Java 1.5) -Yonik On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: Paul, I tried this but it ran out of memory trying to read the 500Mb .fdt file. I tried various values for MAX_BBUF, but it still ran out of memory (I'm using -Xmx1600M, which is the jvm's maximum value (v1.5)) I'll give NioFSDirectory a try. Thanks, Peter On 1/26/06, Paul Elschot [EMAIL PROTECTED] wrote: On Wednesday 25 January 2006 20:51, Peter Keegan wrote: The index is non-compound format and optimized. Yes, I did try MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors) Peter You could also give this a try: http://issues.apache.org/jira/browse/LUCENE-283 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Doug Cutting wrote: A 64-bit JVM with NioDirectory would really be optimal for this. Oops. I meant MMapDirectory, not NioDirectory. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Dumb question: does the 64-bit compiler (javac) generate different code than the 32-bit version, or is it just the jvm that matters? My reported speedups were soley from using the 64-bit jvm with jar files from the 32-bit compiler. Peter On 1/26/06, Yonik Seeley [EMAIL PROTECTED] wrote: Nice speedup! The extra registers in 64 bit mode hay have helped a little too. -Yonik On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: Correction: make that 285 qps :) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
There is no difference in bytecode... the whole difference is just in the underlying JVM. -Yonik On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: Dumb question: does the 64-bit compiler (javac) generate different code than the 32-bit version, or is it just the jvm that matters? My reported speedups were soley from using the 64-bit jvm with jar files from the 32-bit compiler. Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Peter, Wow, the speed up in impressive! But may I ask what did you do to achieve 135 queries/sec prior to the JVM swich? ray, On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote: Correction: make that 285 qps :) On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Thanks all very much. Peter On 1/26/06, Doug Cutting [EMAIL PROTECTED] wrote: Doug Cutting wrote: A 64-bit JVM with NioDirectory would really be optimal for this. Oops. I meant MMapDirectory, not NioDirectory. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Ray, The short answer is that you can make Lucene blazingly fast by using advice and design principles mentioned in this forum and of course reading 'Lucene in Action'. For example, use a 'content' field for searching all fields (vs mutli-field search), put all your stored data in one field, understand the cost of numeric search and sorting. On the platform side, go multi-CPU and of course 64-bit if possible :) Also, I would venture to guess that a lot of search bottlenecks have nothing to do with Lucene, but rather in the infrastructure around it. For example, how does your client interface to the search engine? My results use a plain socket interface between client and server (one connection for queries, another for results), using a simple query/results data format. Introducing other web infrastructures invites degradation in performance, too. I've a bit of experience with search engines, but I'm obviously still learning thanks to this group. Peter On 1/26/06, Ray Tsang [EMAIL PROTECTED] wrote: Peter, Wow, the speed up in impressive! But may I ask what did you do to achieve 135 queries/sec prior to the JVM swich? ray, On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote: Correction: make that 285 qps :) On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Thanks all very much. Peter On 1/26/06, Doug Cutting [EMAIL PROTECTED] wrote: Doug Cutting wrote: A 64-bit JVM with NioDirectory would really be optimal for this. Oops. I meant MMapDirectory, not NioDirectory. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Paul, Thanks for the advice! But for the 100+queries/sec on a 32-bit platfrom, did you end up applying other patches? or use different FSDirectory implementations? Thanks! ray, On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote: Ray, The short answer is that you can make Lucene blazingly fast by using advice and design principles mentioned in this forum and of course reading 'Lucene in Action'. For example, use a 'content' field for searching all fields (vs mutli-field search), put all your stored data in one field, understand the cost of numeric search and sorting. On the platform side, go multi-CPU and of course 64-bit if possible :) Also, I would venture to guess that a lot of search bottlenecks have nothing to do with Lucene, but rather in the infrastructure around it. For example, how does your client interface to the search engine? My results use a plain socket interface between client and server (one connection for queries, another for results), using a simple query/results data format. Introducing other web infrastructures invites degradation in performance, too. I've a bit of experience with search engines, but I'm obviously still learning thanks to this group. Peter On 1/26/06, Ray Tsang [EMAIL PROTECTED] wrote: Peter, Wow, the speed up in impressive! But may I ask what did you do to achieve 135 queries/sec prior to the JVM swich? ray, On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote: Correction: make that 285 qps :) On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Thanks all very much. Peter On 1/26/06, Doug Cutting [EMAIL PROTECTED] wrote: Doug Cutting wrote: A 64-bit JVM with NioDirectory would really be optimal for this. Oops. I meant MMapDirectory, not NioDirectory. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Ray, The 135 qps rate was using the standard FSDirectory in 1.9. Peter On 1/26/06, Ray Tsang [EMAIL PROTECTED] wrote: Paul, Thanks for the advice! But for the 100+queries/sec on a 32-bit platfrom, did you end up applying other patches? or use different FSDirectory implementations? Thanks! ray, On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote: Ray, The short answer is that you can make Lucene blazingly fast by using advice and design principles mentioned in this forum and of course reading 'Lucene in Action'. For example, use a 'content' field for searching all fields (vs mutli-field search), put all your stored data in one field, understand the cost of numeric search and sorting. On the platform side, go multi-CPU and of course 64-bit if possible :) Also, I would venture to guess that a lot of search bottlenecks have nothing to do with Lucene, but rather in the infrastructure around it. For example, how does your client interface to the search engine? My results use a plain socket interface between client and server (one connection for queries, another for results), using a simple query/results data format. Introducing other web infrastructures invites degradation in performance, too. I've a bit of experience with search engines, but I'm obviously still learning thanks to this group. Peter On 1/26/06, Ray Tsang [EMAIL PROTECTED] wrote: Peter, Wow, the speed up in impressive! But may I ask what did you do to achieve 135 queries/sec prior to the JVM swich? ray, On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote: Correction: make that 285 qps :) On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Thanks all very much. Peter On 1/26/06, Doug Cutting [EMAIL PROTECTED] wrote: Doug Cutting wrote: A 64-bit JVM with NioDirectory would really be optimal for this. Oops. I meant MMapDirectory, not NioDirectory. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), the maximum throughput occurred with just 4 query threads. The query throughput decreased with fewer than 4 or greater than 4 query threads. The entire index was most likely in the file system cache, too. Periodic snapshots of stack traces showed most threads blocked in the synchronization in: FSIndexInput.readInternal(), when the thread count exceeded 4. Peter On 11/22/05, Oren Shir [EMAIL PROTECTED] wrote: Hi, There are two sunchronization points: on the stream and on the reader. Using different FSDirectoriy and IndexReaders should solve this. I'll let you know once I code it. Right now I'm checking if making my Documents store less data will move the bottleneck to some other place. Thanks again, Oren Shir On 11/21/05, Doug Cutting [EMAIL PROTECTED] wrote: Jay Booth wrote: I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused threads to stack up in front of this synchronized block and our CPU time wound up being spent thrashing between blocked threads instead of doing anything useful. This is correct. In Lucene, multiple streams per file are created by cloning, and all clones of an FSDirectory input stream share a RandomAccessFile and must synchronize input from it. MmapDirectory does not have this limitation. If your indexes are less than a few GB or you are using 64-bit hardware, then MmapDirectory should work well for you. Otherwise it would be simple to write an nio-based Directory that does not use mmap that is also unsynchronized. Such a contribution would be welcome. Making multiple IndexSearchers and FSDirectories didn't help because in the back end, lucene consults a singleton HashMap of some kind (don't remember implementation) that maintained a single FSDirectory for any given index being accessed from the JVM... multiple calls to FSDirectory.getDirectory actually return the same FSDirectory object with synchronization at the same point. This does not make sense to me. FSDirectory does keep a cache of FSDirectory instances, but i/o should not be synchronized on these. One should be able to open multiple input streams on the same file from an FSDirectory. But this would not be a great solution, since file handle limits would soon become a problem. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Thanks Peter, that's useful info. Just out of curiosity, what kind of box is this? what CPUs? -Yonik On 1/25/06, Peter Keegan [EMAIL PROTECTED] wrote: This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), the maximum throughput occurred with just 4 query threads. The query throughput decreased with fewer than 4 or greater than 4 query threads. The entire index was most likely in the file system cache, too. Periodic snapshots of stack traces showed most threads blocked in the synchronization in: FSIndexInput.readInternal(), when the thread count exceeded 4. Peter On 11/22/05, Oren Shir [EMAIL PROTECTED] wrote: Hi, There are two sunchronization points: on the stream and on the reader. Using different FSDirectoriy and IndexReaders should solve this. I'll let you know once I code it. Right now I'm checking if making my Documents store less data will move the bottleneck to some other place. Thanks again, Oren Shir On 11/21/05, Doug Cutting [EMAIL PROTECTED] wrote: Jay Booth wrote: I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused threads to stack up in front of this synchronized block and our CPU time wound up being spent thrashing between blocked threads instead of doing anything useful. This is correct. In Lucene, multiple streams per file are created by cloning, and all clones of an FSDirectory input stream share a RandomAccessFile and must synchronize input from it. MmapDirectory does not have this limitation. If your indexes are less than a few GB or you are using 64-bit hardware, then MmapDirectory should work well for you. Otherwise it would be simple to write an nio-based Directory that does not use mmap that is also unsynchronized. Such a contribution would be welcome. Making multiple IndexSearchers and FSDirectories didn't help because in the back end, lucene consults a singleton HashMap of some kind (don't remember implementation) that maintained a single FSDirectory for any given index being accessed from the JVM... multiple calls to FSDirectory.getDirectory actually return the same FSDirectory object with synchronization at the same point. This does not make sense to me. FSDirectory does keep a cache of FSDirectory instances, but i/o should not be synchronized on these. One should be able to open multiple input streams on the same file from an FSDirectory. But this would not be a great solution, since file handle limits would soon become a problem. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Peter Keegan wrote: This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), the maximum throughput occurred with just 4 query threads. The query throughput decreased with fewer than 4 or greater than 4 query threads. The entire index was most likely in the file system cache, too. Periodic snapshots of stack traces showed most threads blocked in the synchronization in: FSIndexInput.readInternal(), when the thread count exceeded 4. Was this with a compound or non-compound format index? The non-compound should fare slightly better, since there are more file handles per index. Did you try using MMapDirectory? This should have no i/o concurrency limits, but, on 32-bit systems, only works with indexes less than a few GB. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
It's a 3GHz Intel box with Xeon processors, 64GB ram :) Peter On 1/25/06, Yonik Seeley [EMAIL PROTECTED] wrote: Thanks Peter, that's useful info. Just out of curiosity, what kind of box is this? what CPUs? -Yonik On 1/25/06, Peter Keegan [EMAIL PROTECTED] wrote: This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), the maximum throughput occurred with just 4 query threads. The query throughput decreased with fewer than 4 or greater than 4 query threads. The entire index was most likely in the file system cache, too. Periodic snapshots of stack traces showed most threads blocked in the synchronization in: FSIndexInput.readInternal(), when the thread count exceeded 4. Peter On 11/22/05, Oren Shir [EMAIL PROTECTED] wrote: Hi, There are two sunchronization points: on the stream and on the reader. Using different FSDirectoriy and IndexReaders should solve this. I'll let you know once I code it. Right now I'm checking if making my Documents store less data will move the bottleneck to some other place. Thanks again, Oren Shir On 11/21/05, Doug Cutting [EMAIL PROTECTED] wrote: Jay Booth wrote: I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused threads to stack up in front of this synchronized block and our CPU time wound up being spent thrashing between blocked threads instead of doing anything useful. This is correct. In Lucene, multiple streams per file are created by cloning, and all clones of an FSDirectory input stream share a RandomAccessFile and must synchronize input from it. MmapDirectory does not have this limitation. If your indexes are less than a few GB or you are using 64-bit hardware, then MmapDirectory should work well for you. Otherwise it would be simple to write an nio-based Directory that does not use mmap that is also unsynchronized. Such a contribution would be welcome. Making multiple IndexSearchers and FSDirectories didn't help because in the back end, lucene consults a singleton HashMap of some kind (don't remember implementation) that maintained a single FSDirectory for any given index being accessed from the JVM... multiple calls to FSDirectory.getDirectory actually return the same FSDirectory object with synchronization at the same point. This does not make sense to me. FSDirectory does keep a cache of FSDirectory instances, but i/o should not be synchronized on these. One should be able to open multiple input streams on the same file from an FSDirectory. But this would not be a great solution, since file handle limits would soon become a problem. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
On 1/25/06, Peter Keegan [EMAIL PROTECTED] wrote: It's a 3GHz Intel box with Xeon processors, 64GB ram :) Nice! Xeon processors are normally hyperthreaded. On a linux box, if you cat /proc/cpuinfo, you will see 8 processors for a 4 physical CPU system. Are you positive you have 8 physical Xeon processors? -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Yes, it's hyperthreaded (16 cpus show up in task manager - the box is running 2003). I plan to turn off hyperthreading to see if it has any effect. Peter On 1/25/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 1/25/06, Peter Keegan [EMAIL PROTECTED] wrote: It's a 3GHz Intel box with Xeon processors, 64GB ram :) Nice! Xeon processors are normally hyperthreaded. On a linux box, if you cat /proc/cpuinfo, you will see 8 processors for a 4 physical CPU system. Are you positive you have 8 physical Xeon processors? -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Hi, There are two sunchronization points: on the stream and on the reader. Using different FSDirectoriy and IndexReaders should solve this. I'll let you know once I code it. Right now I'm checking if making my Documents store less data will move the bottleneck to some other place. Thanks again, Oren Shir On 11/21/05, Doug Cutting [EMAIL PROTECTED] wrote: Jay Booth wrote: I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused threads to stack up in front of this synchronized block and our CPU time wound up being spent thrashing between blocked threads instead of doing anything useful. This is correct. In Lucene, multiple streams per file are created by cloning, and all clones of an FSDirectory input stream share a RandomAccessFile and must synchronize input from it. MmapDirectory does not have this limitation. If your indexes are less than a few GB or you are using 64-bit hardware, then MmapDirectory should work well for you. Otherwise it would be simple to write an nio-based Directory that does not use mmap that is also unsynchronized. Such a contribution would be welcome. Making multiple IndexSearchers and FSDirectories didn't help because in the back end, lucene consults a singleton HashMap of some kind (don't remember implementation) that maintained a single FSDirectory for any given index being accessed from the JVM... multiple calls to FSDirectory.getDirectory actually return the same FSDirectory object with synchronization at the same point. This does not make sense to me. FSDirectory does keep a cache of FSDirectory instances, but i/o should not be synchronized on these. One should be able to open multiple input streams on the same file from an FSDirectory. But this would not be a great solution, since file handle limits would soon become a problem. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
Oren Shir wrote: I tested this in version 1.4.3 and 1.9rc1, and they are both the same in this aspect. 1.9rc1 is faster, but does not benefit from multi threading. some newbie questions i have, does 1.4.3 benefit from multi-threading? is 1.9 the version in the source repository? _gk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
This is expected behavior: you are probably quickly becoming CPU bound (which isn't a bad thing). More threads only help when some threads are waiting on IO, or if you actually have a lot of CPUs in the box. -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/21/05, Oren Shir [EMAIL PROTECTED] wrote: Hi, I tried stressing Lucene in a controlled environment: one static IndexSearcher for an index that doesn't change, and in same process I create a number of Threads that call this Searcher concurrently for a limited time. I expected the number of successful queries to increase when using more threads, but this is not the case. From 1 thread to 10 I see 25% increase, but from 10 threads to 100 there is no change, only the average response time increases. same goes for 200 threads. I tried RAMDirectory and FSDirectory, and the behavior is the same. I Extract the first 100 results from the Hits object, but on RAMDirectory this should be insignificant, right? I tested this in version 1.4.3 and 1.9rc1, and they are both the same in this aspect. 1.9rc1 is faster, but does not benefit from multi threading. Did anyone see other behaviour? Will it be better to dedicate a searcher for each thread (maybe http://java.sun.com/j2se/1.4.2/docs/api/java/lang/ThreadLocal.html)? Thanks, Oren Shir - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Throughput doesn't increase when using more concurrent threads
gekkokid, does 1.4.3 benefit from multi-threading? Sorry for not being clear. My tests show that both version does not benefit from multi threading, but it is possible that I'm CPU bound, as Yonik kindly reminded me. is 1.9 the version in the source repository? 1.9 is the version in source repository. It is rather sad if 10 threads reach the CPU limit. I'll check it and get back to you. Thanks, Oren Shir
Re: Throughput doesn't increase when using more concurrent threads
On 11/21/05, Oren Shir [EMAIL PROTECTED] wrote: It is rather sad if 10 threads reach the CPU limit. I'll check it and get back to you. It's about performance and throughput though, not about number of threads it takes to reach saturation. In a 2 CPU box, I would say that the ideal situation is where it only takes two threads to reach 100% CPU utilization. Normally it takes more because of some kind of IO (disk or network). -Yonik Now hiring -- http://forms.cnet.com/slink?231706 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]