Thanks Erik, I will investigate Filters and I'll see then.
On Wed, 9 Mar 2005 14:43:58 -0500, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Mar 9, 2005, at 10:09 AM, javier muguruza wrote: > > (I sent this to the old list, I dont know wether it reached the > > list...just in case I repost it) > > > > Hi all, > > > > We index our documents in the following way: > > > > doc = new Document(); > > // mailid > > doc.add(Field.UnIndexed("mid",mid)); > > //body > > doc.add(Field.UnStored("body", textb)); > > > > mid is a unique identifier, and body contains long pieces of text to > > be indexed. > > > > And later make searches on the body field, the mid allows us to find a > > file on the filesystem with a compressed (and digitally signed) > > version of the original body indexed. > > Our way to work in a query in our app is this: > > 1. first we make a search in a db (for many different reasons) that > > returns a number (from 0 to thousands) of mid > > 2. we use lucene to search for some text in many indexes, this returns > > a second list of mid > > 3. we return the result as the intersection of both lists. > > > > This is working fine right now, but wonder wether we are not using > > lucene to the fullest, cause we could also store mid as a keyword > > (instead of unindexed), and add the condition (AND mid==[any mid from > > our step 1]) to the lucene query we run. My questions are: > > > > 1. Is there a limit in the number of conditions I can add to a query?? > > Sometimes we have 10 mids, other times we have thousands of them so we > > would have to add: AND (mid:mid1 OR mid:mid2 ... OR mid:mid10000). > > Probably there is a limit, and we could only apply the mid conditions > > when the number or mids returned by step 1 is smaller than that limit? > > BooleanQuery has a built-in limit of 1,024 clauses so it would only be > useful when there is a small number of mids. Consider using a Filter > though. There are some built-in ones, but maybe a custom one is best. > > > 2. As the mid is a unique identifier (I guest lucene does not care > > about that right?) > > Right, Lucene doesn't care about field/term uniqueness. > > > , and the condition on the mid woudl be ANDed to the > > text query conditions, will it be faster for lucene to look first in > > the mid field and dont do the text lookup if the mid condition is not > > fullfilled? I dont know wether I am clear enough...Will I get some > > benefit on the queries by adding some additional conditions or the > > cost of adding another field to index will not pay off? Maybe it > > depends on the number of documents? Maybe it would be best to set mid > > as a keyword just in case, and add it as conditions later if the > > searches take too long? > > I doubt you'd even notice the difference. There is little cost to > adding the additional field, and looks like you'd benefit from having > mid as a Keyword. > > Also, with a Filter, you could use it to bounce to your relational > database to constrain results based on a set of mids. Filters are > designed to be used for multiple queries and cached - keep that in mind > and maybe it'll work out well in your scenario. > > Erik > > > > > > thanks for any though on that > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]