[
https://issues.apache.org/jira/browse/SOLR-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013831#comment-15013831
]
Ted Sullivan commented on SOLR-7539:
------------------------------------
Thanks [~markus17] - I agree that there is some maintenance involved maybe even
a lot as you say but IMO, data curation is an important activity that is - or
should be - ongoing - i.e. to avoid the "garbage in garbage out" problem. So
while I agree with you that maintenance is needed to improve precision, I would
argue that this should be done anyway - and that any gains in precision are
good things and should be approached in an incremental fashion. Search can
expose data quality issues, so this maintenance would be ongoing as you say,
but maybe where we disagree is in whether this is necessary or egregious.
Note that the solution can be used in "filter" or "boost" mode - so although I
call it query autofiltering - you can also use it as query autoboosting - the
name I chose obviously indicates my bias however :) I prefer to make this a
choice rather than to hard code it, so boosting is a configurable option that
may be more satisfying to users that have data problems like you describe. In
this case, any improvement in relevance is a win.
Emitting the ambiguities may be a possibility too - are you thinking that this
would be like a "did you mean" where the component would suggest more precise
query contexts - i.e. give the user the chance to agree to what the autofilter
detects? This could work but may suffer from the "too many clicks" objection.
With clean data, I don't see the need for this but as you say, data is rarely
clean. The question is can some of the necessary cleanup be done automatically
during indexing or does it require manual interventions which would be
difficult to scale.
> Add a QueryAutofilteringComponent for query introspection using indexed
> metadata
> --------------------------------------------------------------------------------
>
> Key: SOLR-7539
> URL: https://issues.apache.org/jira/browse/SOLR-7539
> Project: Solr
> Issue Type: New Feature
> Reporter: Ted Sullivan
> Priority: Minor
> Fix For: Trunk
>
> Attachments: SOLR-7539.patch, SOLR-7539.patch, SOLR-7539.patch,
> SOLR-7539.patch
>
>
> The Query Autofiltering Component provides a method of inferring user intent
> by matching noun phrases that are typically used for faceted-navigation into
> Solr filter or boost queries (depending on configuration settings) so that
> more precise user queries can be met with more precise results.
> The algorithm uses a "longest contiguous phrase match" strategy which allows
> it to disambiguate queries where single terms are ambiguous but phrases are
> not. It will work when there is structured information in the form of String
> fields that are normally used for faceted navigation. It works across fields
> by building a map of search term to index field using the Lucene FieldCache
> (UninvertingReader). This enables users to create free text, multi-term
> queries that combine attributes across facet fields - as if they had searched
> and then navigated through several facet layers. To address the problem of
> exact-match only semantics of String fields, support for synonyms (including
> multi-term synonyms) and stemming was added.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]