Hi,
I'm trying to figure out how to speed up queries to a
large index.
I'm currently getting 133 req/sec, which isn't bad,
but isn't too close
to MySQL, which is getting 500 req/sec on the same
hardware with the
same set of documents.
Setup info Stats:
- 4.3M documents, 12 keyword fields per
--- Otis Gospodnetic [EMAIL PROTECTED]
wrote:
The bottleneck seems to be disk IO.
But it's not. Linux is caching the whole file, and
there really isn't any disk activity at all. Most of
the threads are blocked on InputStream.refill, not
waiting for the disk, but waiting for their turn into
Oops, CPU usage is *not* 50%, but closer to 98%.
This is due to a bug in CPU% on RHEL 3 on
multiprocessor CPUS (I ran run multiple threads in
while(1) loops, and it will still only show 50% CPU
usage for that process). The agregated (not
per-process) statistics shown by top are correct, and
they
For example, Nutch automatically translates such
clauses into QueryFilters.
Thanks for the excellent pointer Doug! I'll will
definitely be implementing this optimization.
If anyone cares, I did a 1 minute hprof test with the
search server in a servlet container. Here are the
results (sorry
, but the CPU
utilization decreased to around 55% (in both
configurations above). I'll have to look into that
later, but any additional performance at this point is
pure gravy.
-Yonik
--- Yonik Seeley [EMAIL PROTECTED] wrote:
Doug wrote:
For example, Nutch automatically translates such
clauses
Hi Nader,
I would greatly appreciate it if you could CC me on
the docs or the code.
Thanks!
Yonik
--- Nader Henein [EMAIL PROTECTED] wrote:
It's pretty integrated into our system at this
point, I'm working on
Packaging it and cleaning up my documentation and
then I'll make it
available, I
test
__
Do you Yahoo!?
The all-new My Yahoo! - Get yours free!
http://my.yahoo.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
This won't fully work. You still need to delete the
original out of the lucene index to avoid it showing
up in searches.
Example:
myfile v1: I want a cat
myfile v2: I want a dog
If you change cat to dog in myfile, and then do a
search for cat, you will *only* get v1 and hence the
sort on
I think it depends on the query. If the query (q1)
covers a large number of documents and the fiter
covers a very small number, then using a RangeFilter
will probably be slower than a RangeQuery.
-Yonik
See, this is what I'm not getting: what is the
advantage of the second
world? :) ... in
Hmmm, scratch that. I explained the tradeoff of a
filter vs a range query - not between the different
types of filters you talk about.
--- Yonik Seeley [EMAIL PROTECTED] wrote:
I think it depends on the query. If the query (q1)
covers a large number of documents and the fiter
covers a very
I'd like to use addIndexes(Directory[] dirs) to add
batches of documents to a main index.
My main problem is that the addIndexes()
implementation calls optimize() at the beginning and
the end.
Now, my main index will be ~25GB in size, so adding a
single document and then doing an optimize will
6. Index locally and synchronize changes periodically. This is an
interesting idea and bears looking into. Lucene can combine multiple
indexes into a single one, which can be written out somewhere else, and
then distributed back to the search nodes to replace their existing
index.
This is a
12 matches
Mail list logo