Re: Range Query Question

daniel rosher Fri, 25 Jul 2008 02:36:35 -0700

Hi Thomas,

I think one solution would be similar to the autocomplete function I've
implemented in solr, you can use this as follows in solr:


FieldType:
<fieldType name="autocomplete" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])"
replacement="" replace="all" />
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="20"
minGramSize="1" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])"
replacement="" replace="all" />
</analyzer>
</fieldType>

This can then match on the whole string OR part of the string. To use
the QueryParser, you'd not be using the query part of the analyzer above
but I've included for completeness. The core of it in regards to
wildcard search is the EdgeNGramFilterFactory.

Field:
<field name="name" type="autocomplete" indexed="true" stored="false"
required="false"/>

Then you queries can then become e.g.:

name:["aballadeer" TO "aperfectcirclf"] -- i.e. without wildcards.

Note that you'd need to do the work of the query analyzer up front, i.e.
lowercase the input and remove any non a-z chars. Additionally the '*'
on the start term would need to be removed AND the '*' on the end term
also removed and the last char increased by one char if the '*' is
present. In this case 'e' becomes 'f'.

I think you'd find this a much more efficient solution than using
wildcards which can be a performance bottleneck.

Regards,
Dan

On Fri, 2008-07-25 at 10:53 +0200, Thomas Becker wrote:
> Hi all,
> 
> I need to replace some db queries with lucene due to response time 
> issues for sure. In this special case I need to do a range query on a 
> field and a prefix query. I'm trying to prepare and try my query in luke 
> with no success before migrating it to java.
> 
> I need to find all names starting with for example "A Balladeer" to "A 
> Perfect Circle" in the name field. The sort field is sortName (same 
> content as name, but untokenized for sorting).
> 
> I tried the following in luke which should give me a few hundred docs:
> 
> name:["A Balladeer*" TO "A Perfect Circle*"] - 0 results, also there 
> should be some
> name:["A Balladeer*" TO "B*" - >10k results, but also returns results 
> which have a string in the middle or end starting with A
> 
> I tried using sortName (untokenized) field instead:
> sortName:["A Balladeer*" TO "B*" - 25 results, all starting with A* 
> (guess since it's untokenized), but far less than expected again
> 
> Tried a couple of more (stupid) things with little success. I googled 
> around, but I'm kinda stuck here. So I'm asking the list. How can I 
> search all name/sortName fields in a range between "A Balladeer*" TO "A 
> Perfect Circle*" and get only terms back which are starting with that 
> terms? Is there a way to accomplish that in Java and try it in luke?
> 
> And is there a way to sort resultsets in luke?
> 
> Cheers,
> Thomas
>
Daniel Rosher
Developer
www.thehotonlinenetwork.com
d: 0207 3489 912

    t: 0845 4680 568

    f: 0845 4680 868

    m: 

                Beaumont House, Kensington Village, Avonmore Road, London, W14 
8TS
        


    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - -

    This message is sent in confidence for the addressee only. It may contain 
privileged

    information. The contents are not to be disclosed to anyone other than the 
addressee.

    Unauthorised recipients are requested to preserve this confidentiality and 
to advise

    us of any errors in transmission. Thank you.

    hotonline ltd is registered in England & Wales. Registered office: One 
Canada Square,

    Canary Wharf, London E14 5AP. Registered No: 1904765.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Range Query Question

Reply via email to