Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Paul Elschot
On Wednesday 25 January 2006 20:51, Peter Keegan wrote: The index is non-compound format and optimized. Yes, I did try MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors) Peter You could also give this a try: http://issues.apache.org/jira/browse/LUCENE-283 Regards,

Getting the document number (with IndexReader)

2006-01-26 Thread Chun Wei Ho
I am attempting to prune an index by getting each document in turn and then checking/deleting it: IndexReader ir = IndexReader.open(path); for(int i=0;iir.numDocs();i++) { Document doc = ir.document(i); if(thisDocShouldBeDeleted(doc)) { ir.delete(docNum); // - I

Re: Getting the document number (with IndexReader)

2006-01-26 Thread Paul Elschot
On Thursday 26 January 2006 09:15, Chun Wei Ho wrote: I am attempting to prune an index by getting each document in turn and then checking/deleting it: IndexReader ir = IndexReader.open(path); for(int i=0;iir.numDocs();i++) { Document doc = ir.document(i);

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Speaking of NioFSDirectory, I thought there was one posted a while ago, is this something that can be used? http://issues.apache.org/jira/browse/LUCENE-414 ray, On 11/22/05, Doug Cutting [EMAIL PROTECTED] wrote: Jay Booth wrote: I had a similar problem with threading, the problem turned out

Re: Getting the document number (with IndexReader)

2006-01-26 Thread Chun Wei Ho
Hi, Thanks for the help, just a few more questions: On 1/26/06, Paul Elschot [EMAIL PROTECTED] wrote: On Thursday 26 January 2006 09:15, Chun Wei Ho wrote: I am attempting to prune an index by getting each document in turn and then checking/deleting it: IndexReader ir =

encoding

2006-01-26 Thread arnaudbuffet
Hello, I 've a problem with data i try to index with lucene. I browse a directory and index text from different types of files throw parsers. For text files, data could be in different languages so different encoding. If data are in Turkish for exemple, all special characters and accents are

Range number queries

2006-01-26 Thread Mike Streeton
For the recent questions about this here are a couple of methods for encoding/decoding long values that will be sorted into order by a range query public static String encodeLong(long num) { String hex = Long.toHexString(num 0 ? Long.MAX_VALUE - (0xL ^ num) : num);

Re: Highlighter

2006-01-26 Thread msftblows
Yes, that is correct...you need to rewrite the query. I was actually the main developer for the 1.5 .NET port, so if you come across any issues, please email me at my hotmail address which I check more often than this one... -Joe Langley -Original Message- From: Gwyn Carwardine

RE : encoding

2006-01-26 Thread arnaudbuffet
Hello and thanks for your answer. I do not find the ISOLatin1AccentFilter class in my lucene jar, but I find one on google attach to this mail, could you tell me if it is the good one? I do not see anything in this class which can help me. This program will replace some accent characters but

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Paul, I tried this but it ran out of memory trying to read the 500Mb .fdt file. I tried various values for MAX_BBUF, but it still ran out of memory (I'm using -Xmx1600M, which is the jvm's maximum value (v1.5)) I'll give NioFSDirectory a try. Thanks, Peter On 1/26/06, Paul Elschot [EMAIL

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Ray, The throughput is worse with NioFSDIrectory than with the FSDIrectory (patched and unpatched). The bottleneck still seems to be synchronization, this time in NioFile.getChannel (7 of the 8 threads were blocked there during one snapshot). I tried this with 4 and 8 channels. The throughput

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Yonik Seeley
Hmmm, can you run the 64 bit version of Windows (and hence a 64 bit JVM?) We're running with heap sizes up to 8GB (RH Linux 64 bit, Opterons, Sun Java 1.5) -Yonik On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: Paul, I tried this but it ran out of memory trying to read the 500Mb .fdt file.

Re: RE : encoding

2006-01-26 Thread Erik Hatcher
On Jan 26, 2006, at 7:26 PM, arnaudbuffet wrote: I do not find the ISOLatin1AccentFilter class in my lucene jar, but I find one on google attach to this mail, could you tell me if it is the good one? This used to be in contrib/analyzers but has been moved into the core (Subversion only

Re: encoding

2006-01-26 Thread John Haxby
arnaudbuffet wrote: if I try to index a text file encoded in Western 1252 for exemple with the Turkish text düzenlediğimiz kampanyamıza the lucene index will contain re encoded data with #0;#17;k#0;#0; ISOLatin1AccentFilter.removeAccents() converts that string to duzenlediğimiz

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. btw, I'm getting a sustained rate of 135 queries/sec with 4 threads, which is pretty impressive. Another way around the concurrency limit is to run

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Yonik Seeley
BEA Jrockit supports both AMD64 and Intel's EM64T (basically renamed AMD64) http://www.bea.com/framework.jsp?CNT=index.htmFP=/content/products/jrockit/ and Sun's Java 1.5 for Windows AMD64 Platform They advertize AMD64, presumably because that's what there servers use, but it should work on

Re: Getting the document number (with IndexReader)

2006-01-26 Thread Chris Hostetter
: The document number is the variable i in this case. : If the document number is the variable i (enumerated from numDocs()), : what's the difference between numDocs() and maxDoc() in this case? I : was previously under the impression that the internal docNum might be : different to the counter.

Re: Getting the document number (with IndexReader)

2006-01-26 Thread Paul Elschot
On Thursday 26 January 2006 09:47, Chun Wei Ho wrote: Hi, Thanks for the help, just a few more questions: On 1/26/06, Paul Elschot [EMAIL PROTECTED] wrote: On Thursday 26 January 2006 09:15, Chun Wei Ho wrote: I am attempting to prune an index by getting each document in turn and

Re: Getting the document number (with IndexReader)

2006-01-26 Thread Paul Elschot
On Thursday 26 January 2006 19:44, Chris Hostetter wrote: : The document number is the variable i in this case. : If the document number is the variable i (enumerated from numDocs()), : what's the difference between numDocs() and maxDoc() in this case? I : was previously under the

Re: encoding

2006-01-26 Thread petite_abeille
Hello, On Jan 26, 2006, at 12:01, John Haxby wrote: I have a perl script here that I used to generate downgrading table for a C program. I can let you have the perl script as is, but if there's enough interest(*) I'll use it to generate, say, CompoundAsciiFilter since it converts compound

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Doug Cutting
Doug Cutting wrote: A 64-bit JVM with NioDirectory would really be optimal for this. Oops. I meant MMapDirectory, not NioDirectory. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Dumb question: does the 64-bit compiler (javac) generate different code than the 32-bit version, or is it just the jvm that matters? My reported speedups were soley from using the 64-bit jvm with jar files from the 32-bit compiler. Peter On 1/26/06, Yonik Seeley [EMAIL PROTECTED] wrote: Nice

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Yonik Seeley
There is no difference in bytecode... the whole difference is just in the underlying JVM. -Yonik On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: Dumb question: does the 64-bit compiler (javac) generate different code than the 32-bit version, or is it just the jvm that matters? My reported

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Peter, Wow, the speed up in impressive! But may I ask what did you do to achieve 135 queries/sec prior to the JVM swich? ray, On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote: Correction: make that 285 qps :) On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote: I tried the AMD64-bit JVM

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Ray, The short answer is that you can make Lucene blazingly fast by using advice and design principles mentioned in this forum and of course reading 'Lucene in Action'. For example, use a 'content' field for searching all fields (vs mutli-field search), put all your stored data in one field,

problem updating a document: no segments file?

2006-01-26 Thread John Powers
Hello, I have a couple instances of lucene. I just altered on implementation and now its not keeping a segments file. while indexing occurs, there is a segment file.but once its done, there isn't.all the other indexes have one. the problem comes when i try to update a document,

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Paul, Thanks for the advice! But for the 100+queries/sec on a 32-bit platfrom, did you end up applying other patches? or use different FSDirectory implementations? Thanks! ray, On 1/27/06, Peter Keegan [EMAIL PROTECTED] wrote: Ray, The short answer is that you can make Lucene blazingly fast

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Peter Keegan
Ray, The 135 qps rate was using the standard FSDirectory in 1.9. Peter On 1/26/06, Ray Tsang [EMAIL PROTECTED] wrote: Paul, Thanks for the advice! But for the 100+queries/sec on a 32-bit platfrom, did you end up applying other patches? or use different FSDirectory implementations?

Re: Two strange things in Lucene

2006-01-26 Thread Daniel Pfeifer
Since I didn't find anything in the log from log4j I did a kill -3 on the process and found two very interesting things: Almost all multisearcher threads were in this state: MultiSearcher thread #1 daemon prio=10 tid=0x01900960 nid=0x81442c waiting for monitor entry

How does the lucene normalize the score?

2006-01-26 Thread xing jiang
Hi, I want to know how the lucene normalizes the score. I see hits class has this function to get each document's score. But i dont know how lucene calculates the normalized score and in the Lucene in action, it only said normalized score of the nth top scoring docuemnts. -- Regards Jiang Xing

Re: Performance tips?

2006-01-26 Thread Chris Lamprecht
I seem to say this a lot :), but, assuming your OS has a decent filesystem cache, try reducing your JVM heapsize, using an FSDirectory instead of RAMDirectory, and see if your filesystem cache does ok. If you have 12GB, then you should have enough RAM to hold both the old and new indexes during