I wonder if there's been any thought by the community to refactoring
fieldTypes to allow multiple query-time analyzers per indexed field?
Currently, to get different query-time analysis behavior you have to
duplicate a field. This is unfortunate duplication if, for example, I want
to search a field with query time synonyms on/off. For higher scale search
cases, allowing multiple query time analyzers against a single index field
can be invaluable. It's one reason I created the Match Query Parser (
https://github.com/o19s/match-query-parser) and a major feature of
hon-lucene-synonyms (https://github.com/healthonnet/hon-lucene-synonyms )

What I would propose is the ability to place multiple analyzers under a
field type. For example:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="query" default="true"
name="with_synonyms">...</analyzer>
    <analyzer type="query" name="without_synonyms">...</analyzer>
    <analyzer type="index">...</analyzer>
</fieldType>

Notice how one query-time analyzer is "default" (and including only one
would make it the default)

This would require allowing query parsers pass the analyzer to use at query
time. I would propose introduce a syntax for configuring query behavior
per-field in edismax. Omitting this would continue to use the default
behavior/analyzer.

For example, one could query title and text as usual:

q=action movies&qf=actor title text&defType=edismax

I would propose introducing a syntax whereby qf could refer to a kind of
psuedo field, configurable with a syntax similar to per-field facet settings

For example, below "actor_nosyn" and "actor_syn" actually search the same
physical field, but are configured with different analyzers

q=action movies&qf=actor_syn actor_nosyn^10 title
text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_nosyn.analyzer=without_synonyms&qf.actor_syn.field=actor&qf.actor_syn.analyzer=with_synonyms

Indeed, I would propose extending this syntax to control some of the
query-specific properties that currently are tied to the fieldType, such as

q=action movies&qf=actor_syn actor_nosyn^10 title
text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_nosyn.analyzer=without_synonyms&qf.actor_syn.field=actor&qf.actor_syn.analyzer=with_synonyms&qf.actorNoSyn.autoGeneratePhraseQueries=false

I think this could be a pretty powerful syntax, but would require
refactoring of the field type and edismax (and possibly other query
parsers) quite a bit

Any thoughts?

Best
-Doug
-- 
Consultant, OpenSource Connections. Contact info at
http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Reply via email to