Re: Multiple Query-Time Analyzers in Solr

Trey Grainger Thu, 23 Nov 2017 23:13:20 -0800

Doug - see https://issues.apache.org/jira/browse/SOLR-6492.


I implemented something previously that accomplishes the stated goal (it's
part of Chapter 14 of* Solr in Action <http://solrinaction.com>*).
Specifically, it is a text field that allows you to dynamically change the
analyzer(s) at index time (on a per document basis) or at query time (on a
per-term basis) while using the same actual field in the index.

One interesting note - you can actually choose *multiple* analyzers per
field for the same document or query (you're not restricted to one, as in
your proposed example). For example, if you wanted to index or query text
in multiple languages at the same time on the same text, you could specify
the analyzer for each language and it would run your text (independently)
through them all prior to indexing or as part of the query construction.

The syntax isn't elegant (feels a bit ugly since you can switch analyzers
per-term - but therein also lies tremendous flexibility), but it works. It
currently requires you to pass in the analyzers you want to use either in
the content of you field (index-time) or as part of your query, which means
no schema changes are necessary other than using a special field type for
the dynamic analyzer behavior. Something like the schema changes you
proposed would make it easier to use in most cases, though.

 I've unfortunately done an awful job of keeping the JIRA moving along
toward getting it committed (busy schedule), but it's something you can
take a look at. Would be happy to collaborate with you if you're thinking
about doing work in this area.

All the best,

Trey Grainger
Co-Author, *Solr in Action*
SVP of Engineering @ Lucidworks

On Thu, Nov 23, 2017 at 11:03 AM, Doug Turnbull <
[email protected]> wrote:

> An alternate solution could be to create a fieldType that was a
> "FacadeTextField" that searches a real TextField field with a different
> query time analyzer. IE it would not have a physical representation in the
> index, but just provide a handle to a "field" that is searched with a
> different query time analyzer.
>
> For example, actor_nosyn is really a facade for searching "actor" with a
> different analyzer
>
> <!-- search actor field without synonyms -->
>   <field name="actor_nosyn" type="text_nosyn" facadeOf="actor"/>
>
> <!-- searches actor field as normal text field -->
>   <field name="actor" type="text" indexed="true" stored="true"/>
>
>
> <!-- Facade field type that places a different query time analyzer in
> front of another field -->
> <fieldType name="text_nosyn" class="solr.FacadeTextField" >
>     <analyzer type="query" >...</analyzer>
> </fieldType>
>
> <!-- fully fledged text field type -->
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>     <analyzer type="query" >...</analyzer>
>     <analyzer type="index" >...</analyzer>
> </fieldType>
>
> This would allow edismax and other query parsers to remain unchanged
> searching, ie:
>
> q=action movies&qf=actor actor_nosyn title text&defType=edismax
>
>
>
> On Thu, Nov 23, 2017 at 10:50 AM Doug Turnbull <dturnbull@
> opensourceconnections.com> wrote:
>
>> I wonder if there's been any thought by the community to refactoring
>> fieldTypes to allow multiple query-time analyzers per indexed field?
>> Currently, to get different query-time analysis behavior you have to
>> duplicate a field. This is unfortunate duplication if, for example, I want
>> to search a field with query time synonyms on/off. For higher scale search
>> cases, allowing multiple query time analyzers against a single index field
>> can be invaluable. It's one reason I created the Match Query Parser (
>> https://github.com/o19s/match-query-parser) and a major feature of
>> hon-lucene-synonyms (https://github.com/healthonnet/hon-lucene-synonyms )
>>
>> What I would propose is the ability to place multiple analyzers under a
>> field type. For example:
>>
>> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>>     <analyzer type="query" default="true" name="with_synonyms">...</
>> analyzer>
>>     <analyzer type="query" name="without_synonyms">...</analyzer>
>>     <analyzer type="index">...</analyzer>
>> </fieldType>
>>
>> Notice how one query-time analyzer is "default" (and including only one
>> would make it the default)
>>
>> This would require allowing query parsers pass the analyzer to use at
>> query time. I would propose introduce a syntax for configuring query
>> behavior per-field in edismax. Omitting this would continue to use the
>> default behavior/analyzer.
>>
>> For example, one could query title and text as usual:
>>
>> q=action movies&qf=actor title text&defType=edismax
>>
>> I would propose introducing a syntax whereby qf could refer to a kind of
>> psuedo field, configurable with a syntax similar to per-field facet settings
>>
>> For example, below "actor_nosyn" and "actor_syn" actually search the same
>> physical field, but are configured with different analyzers
>>
>> q=action movies&qf=actor_syn actor_nosyn^10 title
>> text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_
>> nosyn.analyzer=without_synonyms&qf.actor_syn.field=
>> actor&qf.actor_syn.analyzer=with_synonyms
>>
>> Indeed, I would propose extending this syntax to control some of the
>> query-specific properties that currently are tied to the fieldType, such as
>>
>> q=action movies&qf=actor_syn actor_nosyn^10 title
>> text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_
>> nosyn.analyzer=without_synonyms&qf.actor_syn.field=
>> actor&qf.actor_syn.analyzer=with_synonyms&qf.actorNoSyn.
>> autoGeneratePhraseQueries=false
>>
>> I think this could be a pretty powerful syntax, but would require
>> refactoring of the field type and edismax (and possibly other query
>> parsers) quite a bit
>>
>> Any thoughts?
>>
>> Best
>> -Doug
>> --
>> Consultant, OpenSource Connections. Contact info at
>> http://o19s.com/about-us/doug-turnbull/; Free/Busy (
>> http://bit.ly/dougs_cal)
>>
> --
> Consultant, OpenSource Connections. Contact info at
> http://o19s.com/about-us/doug-turnbull/; Free/Busy (
> http://bit.ly/dougs_cal)
>

Re: Multiple Query-Time Analyzers in Solr

Reply via email to