speeding up queries (MySQL faster)

2004-08-20 Thread Yonik Seeley
Hi, I'm trying to figure out how to speed up queries to a large index. I'm currently getting 133 req/sec, which isn't bad, but isn't too close to MySQL, which is getting 500 req/sec on the same hardware with the same set of documents. Setup info Stats: - 4.3M documents, 12 keyword fields per

Re: speeding up queries (MySQL faster)

2004-08-20 Thread Yonik Seeley
--- Otis Gospodnetic [EMAIL PROTECTED] wrote: The bottleneck seems to be disk IO. But it's not. Linux is caching the whole file, and there really isn't any disk activity at all. Most of the threads are blocked on InputStream.refill, not waiting for the disk, but waiting for their turn into

Re: speeding up queries (MySQL faster)

2004-08-22 Thread Yonik Seeley
Oops, CPU usage is *not* 50%, but closer to 98%. This is due to a bug in CPU% on RHEL 3 on multiprocessor CPUS (I ran run multiple threads in while(1) loops, and it will still only show 50% CPU usage for that process). The agregated (not per-process) statistics shown by top are correct, and they

Re: speeding up queries (MySQL faster)

2004-08-22 Thread Yonik Seeley
For example, Nutch automatically translates such clauses into QueryFilters. Thanks for the excellent pointer Doug! I'll will definitely be implementing this optimization. If anyone cares, I did a 1 minute hprof test with the search server in a servlet container. Here are the results (sorry

Re: speeding up queries (MySQL faster)

2004-08-27 Thread Yonik Seeley
, but the CPU utilization decreased to around 55% (in both configurations above). I'll have to look into that later, but any additional performance at this point is pure gravy. -Yonik --- Yonik Seeley [EMAIL PROTECTED] wrote: Doug wrote: For example, Nutch automatically translates such clauses

Re: Atomicity in Lucene operations

2004-10-18 Thread Yonik Seeley
Hi Nader, I would greatly appreciate it if you could CC me on the docs or the code. Thanks! Yonik --- Nader Henein [EMAIL PROTECTED] wrote: It's pretty integrated into our system at this point, I'm working on Packaging it and cleaning up my documentation and then I'll make it available, I

Re: WildcardTermEnum skipping terms containing numbers?!

2004-11-17 Thread Yonik Seeley
test __ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: version documents

2004-11-18 Thread Yonik Seeley
This won't fully work. You still need to delete the original out of the lucene index to avoid it showing up in searches. Example: myfile v1: I want a cat myfile v2: I want a dog If you change cat to dog in myfile, and then do a search for cat, you will *only* get v1 and hence the sort on

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Yonik Seeley
I think it depends on the query. If the query (q1) covers a large number of documents and the fiter covers a very small number, then using a RangeFilter will probably be slower than a RangeQuery. -Yonik See, this is what I'm not getting: what is the advantage of the second world? :) ... in

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Yonik Seeley
Hmmm, scratch that. I explained the tradeoff of a filter vs a range query - not between the different types of filters you talk about. --- Yonik Seeley [EMAIL PROTECTED] wrote: I think it depends on the query. If the query (q1) covers a large number of documents and the fiter covers a very

IndexWriter.addIndexes efficiency

2004-11-28 Thread Yonik Seeley
I'd like to use addIndexes(Directory[] dirs) to add batches of documents to a main index. My main problem is that the addIndexes() implementation calls optimize() at the beginning and the end. Now, my main index will be ~25GB in size, so adding a single document and then doing an optimize will

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-01 Thread Yonik Seeley
6. Index locally and synchronize changes periodically. This is an interesting idea and bears looking into. Lucene can combine multiple indexes into a single one, which can be written out somewhere else, and then distributed back to the search nodes to replace their existing index. This is a