Re: Question about proximity searching and wildcards

Morus Walter Tue, 18 Jan 2005 23:53:13 -0800

Mariella Di Giacomo writes:
> Hello,
> 
> We are using Lucene to index scientific articles.
> We are also using Luke to verify the fields and values we index.
> 
> One of the fields we index is the author field that consists of the authors 
> that have written the scientific article (an example of such data is shown 
> at the bottom of the email).
> 
> The most common search on the author field is the following:
> 
> "find all the authors whose last name starts with Cole and the first name 
> starts with S"
> 
> We thought of a proximity search (we want to make sure we take the first 
> name and not the middle name/initial) similar like that
> 
Query parser cannot do that.


> "Author:cole* S*"~1

In that case you cannot expand the wildcard terms.

> "Author:cole* AND Author:S*"~1

You cannot mix boolean queries and proximity queries.

What comes next to your query is phrase prefix query, but that's designed
for something like 'Cole S*' not 'Cole* S*'.

Searching for 'Cole* S*' means to search for all combinations of possible 
expansions of Cole and S. You can do that by expanding the terms yourself
but I'd expect that a) to be slow and b) to create trouble with the maximum
number of boolean terms (or memory usage).
Given that there are 10 expansions of Cole and 500 of S (that's not just
first names, that all names) you have to do 5000 proximity searches.
 
> If Luke cannot deal with that, when writing the query through the Java 
> application, which would be the
> query to be provided to get what expected ?
> Do we need to use a query filter ?
> 
I would use different fields for first and last name in this case.
And if it's relevant to search for the first character of the first name,
I'd index that additionally.

HTH
        Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about proximity searching and wildcards

Reply via email to