At 09:02 PM 11/27/01 +0000, Mark Maunder wrote: >I'm using it on our site and searching fulltext >indexes on three fields (including a large text field) in under 3 seconds on over >70,000 records on a p550 with 490 megs of ram. > > Hi Mark,
<plug> Some day if you are bored, try indexing with swish-e (the development version). http://swish-e.org The big problem with it right now is it doesn't do incremental indexing. One of the developers is trying to get that working with in a few weeks. But for most small sets of files it's not an issue since indexing is so fast. My favorite feature is it can run an external program, such as a perl mbox or html parser or perl spider, or DBI program or whatever to get the source to index. Use it with Cache::Cache and mod_perl and it's nice and fast from page to page of results. Here's indexing only 24,000 files: > ./swish-e -c u -i /usr/doc Indexing Data Source: "File-System" Indexing "/usr/doc" 270279 unique words indexed. 4 properties sorted. 23840 files indexed. 177638538 total bytes. Elapsed time: 00:03:50 CPU time: 00:03:16 Indexing done! Here's searching: > ./swish-e -w install -m 1 # SWISH format: 2.1-dev-24 # Search words: install # Number of hits: 2202 # Search time: 0.006 seconds # Run time: 0.011 seconds A phrase: > ./swish-e -w '"public license"' -m 1 # SWISH format: 2.1-dev-24 # Search words: "public license" # Number of hits: 348 # Search time: 0.007 seconds # Run time: 0.012 seconds 998 /usr/doc/packages/ijb/gpl.html "gpl.html" 26002 A wild card and boolean search: > ./swish-e -w 'sa* or java' -m 1 # SWISH format: 2.1-dev-24 # Search words: sa* or java # Number of hits: 7476 # Search time: 0.082 seconds # Run time: 0.087 seconds Or a good number of results: > ./swish-e -w 'is or und or run' -m 1 # SWISH format: 2.1-dev-24 # Search words: is or und or run # Number of hits: 14477 # Search time: 0.084 seconds # Run time: 0.089 seconds Or everything: > ./swish-e -w 'not dksksks' -m 1 # SWISH format: 2.1-dev-24 # Search words: not dksksks # Number of hits: 23840 # Search time: 0.069 seconds # Run time: 0.074 seconds This is pushing the limit for little old swish, but here's indexing a few more very small xml files (~150 bytes each) 3830016 files indexed. 582898349 total bytes. Elapsed time: 00:48:22 CPU time: 00:44:01 </plug> Bill Moseley mailto:[EMAIL PROTECTED]