Search performance using BooleanQueries in BooleanQueries

2007-10-26 Thread Ard Schrijvers
Hello, I am seeing that a query with boolean queries in boolean queries takes much longer than just a single boolean query when the number of hits if fairly large. For example +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e is much faster than (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d)

Re: lucene indexing doubts

2007-10-26 Thread Karl Wettin
26 okt 2007 kl. 06.31 skrev poojasreejith: I have a folder which contains the indexed files. so, suppose if i want to add one more indexed data into it, without deleting the whole folder and performing the indexing for all the files again. I want it to do only that one file and add the

Re: Java Heap Space -Out Of Memory Error

2007-10-26 Thread Sebastin
Hi All, is it now possible to release the memory after every search in lucene for 50 GB of records. testn wrote: I think you store dateSc with full precision i.e. with time. You should consider to index it just date part or to the resolution you really need. It should reduce the

Scaling out Lucene /general architecture Q

2007-10-26 Thread Mankowski, Chris
I'm new to lucene and am interested in learning how enterprises deploy multi-server installations of lucene for large 24x7 operations. The first question that comes to mind is: are most of the design decisions made at during development time, or can a simple server be 'grown into' something

Re: lucene indexing doubts

2007-10-26 Thread mark harwood
Guessing your problem here too but see http://www.htxs.nl/docs/lucene/docs/api/org/apache/lucene/demo/IndexHTML.html It shows an approach to incremental indexing which updates an index with only the changed files in a folder. - Original Message From: poojasreejith [EMAIL PROTECTED]

fuzzy search MultifieldQueryParser - Lucene 2.2

2007-10-26 Thread Zdeněk Vráblík
Hi all, How could I set fuzzy search in MultifieldQueryParser? It works if query string ends with ~, but how to switch it on for all query? I would like to search without fuzzy and if nothing is found I would like to search with fuzzy search. Thanks. Regards, Zdenek

Exit a search when have enough results

2007-10-26 Thread John Patterson
Hi, I am doing a simple conjunction search for documents that do not need to be scored or sorted and was wondering if there is a way to stop the search from a hit collector when I have enough hits? I guess I am after a hot collector that can return a boolean determining if the search should

Cache BitSet or doc number?

2007-10-26 Thread John Patterson
Hi, I am thinking about caching search results for common queries and just want to check that for small numbers of results it would be better to store the doc number as ints or shorts than to store a Filter with a BitSet. I guess if you results contain less than 1/32 or 1/16 of the number of

Re: Cache BitSet or doc number?

2007-10-26 Thread Thom Nelson
Check out the HashDocSet from Solr, this is the best way to cache small sets of search results. In general, the Solr BitSet/DocSet classes are more efficient than using the standard java.util.BitSet. You can use these independent of the rest of Solr (though I recommend checking out Solr if

Re: Exit a search when have enough results

2007-10-26 Thread Yonik Seeley
On 10/26/07, John Patterson [EMAIL PROTECTED] wrote: I am doing a simple conjunction search for documents that do not need to be scored or sorted and was wondering if there is a way to stop the search from a hit collector when I have enough hits? The easiest way would be to throw an exception

Re: Exit a search when have enough results

2007-10-26 Thread John Patterson
Yonik Seeley wrote: The easiest way would be to throw an exception from a custom hit collector (and then catch it yourself and continue). Cheers, I wonder if the performance penalty from throwing an exception is worth it? -- View this message in context:

Re: Cache BitSet or doc number?

2007-10-26 Thread John Patterson
Thom Nelson wrote: Check out the HashDocSet from Solr, this is the best way to cache small sets of search results. In general, the Solr BitSet/DocSet classes are more efficient than using the standard java.util.BitSet. You can use these independent of the rest of Solr (though I

Re: fuzzy search MultifieldQueryParser - Lucene 2.2

2007-10-26 Thread Daniel Naber
On Friday 26 October 2007 19:06, Zdeněk Vráblík wrote: It works if query string ends with ~, but how to switch it on for all query? That's not supported AFAIK. You will need to iterate over the query (recursively if it's an instance of BooleanQuery) and create a new query where all parts are

Re: Exit a search when have enough results

2007-10-26 Thread Yonik Seeley
On 10/26/07, John Patterson [EMAIL PROTECTED] wrote: Yonik Seeley wrote: The easiest way would be to throw an exception from a custom hit collector (and then catch it yourself and continue). Cheers, I wonder if the performance penalty from throwing an exception is worth it? If you

Re: Cache BitSet or doc number?

2007-10-26 Thread Yonik Seeley
On 10/26/07, John Patterson [EMAIL PROTECTED] wrote: Thom Nelson wrote: Check out the HashDocSet from Solr, this is the best way to cache small sets of search results. In general, the Solr BitSet/DocSet classes are more efficient than using the standard java.util.BitSet. You can use

Sorted Index

2007-10-26 Thread John Patterson
Hi, What's the best way to maintain an index that is sorted? -- View this message in context: http://www.nabble.com/Sorted-Index-tf4701044.html#a13438928 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: Sorted Index

2007-10-26 Thread Yonik Seeley
On 10/26/07, John Patterson [EMAIL PROTECTED] wrote: What's the best way to maintain an index that is sorted? Most things in an inverted index are sorted (terms, matching document ids, term positions within a field, etc). Can you be more specific about what you are trying to accomplish? -Yonik

Re: Sorted Index

2007-10-26 Thread John Patterson
Yonik Seeley wrote: On 10/26/07, John Patterson [EMAIL PROTECTED] wrote: Most things in an inverted index are sorted (terms, matching document ids, term positions within a field, etc). Can you be more specific about what you are trying to accomplish? Sorry, I mean sorting the

Re: HTML analyzer

2007-10-26 Thread Cool Coder
Thanks Ketin for your input. There is already build in HTML strip reader i.e. HTMLStripReader in solr, which I am currently using to strip all HTML tags before creating index. This also solved my earlier problem related to highlighter , which was highlighting HTML tags e.g. I was searching for