Re: Extracting fuzzy match terms

mark Tue, 28 Apr 2015 13:27:07 -0700

All Lucene queries implement extractTerms [1] and this API is used by 
highlighter implementations to get the expanded set of terms in 
wildcards/fuzzy etc.
This set of terms isn't exposed directly in elasticsearch today but you may 
be able to hack something together using scripts or a custom Java plugin - 
look at SearchContext.current().query().extractTerms().


Cheers
Mark

[1] 
http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/Query.html#extractTerms(java.util.Set)


On Tuesday, April 28, 2015 at 12:00:49 PM UTC+1, Graham Turner wrote:
>
> Thanks Mark.
>
> I did wonder about the highlighter, but using it would mean potentially 
> retrieving every hit and parsing it, which feels pretty impractical for 
> large searches.  
>
> Presumably the fuzzy query has to identify a full list of matching terms 
> internally - is there any way we could somehow hook into this, or retrieve 
> the list separately to the query results?  A mechanism similar to the 
> suggester, just accepting a single fuzzy term or a wildcard term would be 
> perfect.  I appreciate this probably isn't a common request, but I'm sure 
> it would have other use cases.  Something to consider for a future release 
> perhaps?  :-)
>
> Cheers
>
> Graham
>
>
> On Monday, 27 April 2015 17:41:17 UTC+1, ma...@elastic.co wrote:
>>
>> Hi Graham,
>> If you were to use the highlighter functionality you would essentially 
>> "see what the search engine saw".
>> With some client-side coding you could parse out the expanded search 
>> terms because they would be surrounded by tags in matching docs.
>> Of course this wouldn't provide a de-duped list of terms and would be 
>> inefficient to return an exhaustive list of all expansions used but may be 
>> an approach to investigate. 
>>
>> Cheers
>> Mark
>>
>> On Monday, April 27, 2015 at 5:08:55 PM UTC+1, Graham Turner wrote:
>>>
>>> Hi,
>>>
>>> I'm working on a proof-of-concept for a client, replacing an existing 
>>> legacy search system with an elastic based alternative.  One of the 
>>> requirements that comes from the existing system is that, when performing a 
>>> fuzzy or wildcard search, the user can view all the matching terms, and 
>>> include/exclude them manually from the subsequent search.
>>>
>>> Thus, if a fuzzy search for 'graham' is submitted (or a wildcard like 
>>> 'gr*m*'), it might match grayam, graeme, grahum, grahem, etc.  The users 
>>> want to be able to see this list of matched terms, then, for instance, 
>>> exclude 'grayam' from the expanded terms list, so that all the other 
>>> expansions are used, but not the specifically excluded one. 
>>>
>>> I’m struggling to retrieve this list of terms in the first place.  
>>> Ideally I’d like to submit a simple query for a fuzzy or wildcard term, and 
>>> have it return just the possible matching terms (up to a given limit).
>>>
>>> I’ve had reasonable success using the term suggester for fuzzy-type 
>>> responses, but can’t use this for wildcard expansions. 
>>>
>>> Is there a good way to do this using 'out-of-the-box' elastic 
>>> functionality?  
>>>
>>> Any advice / hints gratefully accepted!
>>>
>>> Thanks
>>>
>>> Graham
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8672e94-9063-4005-9d53-15b5cd0c6beb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Extracting fuzzy match terms

Reply via email to