Thanks Steve, Trey, David, and Mikhail

Lots of great ideas, it seems like there's some consensus around

- Creating multiple named analyzers per field
- Referencing those analyzers by name at query time somehow

I would advocate for refactoring edismax (or making a new query parser)
that would allow you to specify per-field query configuration. Then I would
advocate refactoring some of the flags autoGeneratePhraseQueries, etc to
this query-time config. Then we could follow suit using the same syntax to
specify the analyzer to use at query time.

Perhaps more generally these configuration items can stay on the fieldType,
but a syntax could allow them to be overriden per field at query time?

Finally, another requirement I would add would be the ability to specify
the same field twice in qf, but configured to be queries two different
ways. Perhaps a syntax like qf=title:config1  title:config2? Where config1
and config2 modify fieldType query flags? Like
fieldConfig.config1.autoGeneratePhraseQuerise=false&ieldConfig.config1.queryAnalyzer=no_synonyms

This sort of thing would in my opinion help both enhance the power of Solr,
but with a more consistent vision around how field-specific query settings
could be organized

Best
-Doug

On Fri, Nov 24, 2017 at 3:25 PM Steve Rowe <[email protected]> wrote:

> Somewhat orthogonal here, but I’ve long thought that it would be useful to
> introduce named analyzers that could be referenced by name from potentially
> multiple field types.
>
> --
> Steve
> www.lucidworks.com
>
> > On Nov 24, 2017, at 10:17 AM, David Smiley <[email protected]>
> wrote:
> >
> > Doug,
> >
> > I think it would be wonderful if a FieldType had N analyzer chains
> instead of exactly 3 (index, query, multiTerm).  Each chain could simply
> have a name.  The query parser could be configured to pick a particular
> chain by name.
> >
> > I worked on a search project that had like a half dozen query analyzers,
> which were also machine generated in code on the custom FieldType.  The
> query parser, also custom, could then communicate with the FieldType to get
> the particular analyzer that was appropriate for the use.
> >
> > It's annoying (hard to maintain) to see repeated chains that are
> slightly different.  I've wondered if it would be more maintainable to have
> one chain, with some qualifier on each element to say to which named chains
> it applies to (if not all)?  I dunno; trade-offs, trade-offs.
> >
> > ~ David
> >
> > On Thu, Nov 23, 2017 at 11:03 AM Doug Turnbull <
> [email protected]> wrote:
> > An alternate solution could be to create a fieldType that was a
> "FacadeTextField" that searches a real TextField field with a different
> query time analyzer. IE it would not have a physical representation in the
> index, but just provide a handle to a "field" that is searched with a
> different query time analyzer.
> >
> > For example, actor_nosyn is really a facade for searching "actor" with a
> different analyzer
> >
> > <!-- search actor field without synonyms -->
> >   <field name="actor_nosyn" type="text_nosyn" facadeOf="actor"/>
> >
> > <!-- searches actor field as normal text field -->
> >   <field name="actor" type="text" indexed="true" stored="true"/>
> >
> >
> > <!-- Facade field type that places a different query time analyzer in
> front of another field -->
> > <fieldType name="text_nosyn" class="solr.FacadeTextField" >
> >     <analyzer type="query" >...</analyzer>
> > </fieldType>
> >
> > <!-- fully fledged text field type -->
> > <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
> >     <analyzer type="query" >...</analyzer>
> >     <analyzer type="index" >...</analyzer>
> > </fieldType>
> >
> > This would allow edismax and other query parsers to remain unchanged
> searching, ie:
> >
> > q=action movies&qf=actor actor_nosyn title text&defType=edismax
> >
> >
> >
> > On Thu, Nov 23, 2017 at 10:50 AM Doug Turnbull <
> [email protected]> wrote:
> > I wonder if there's been any thought by the community to refactoring
> fieldTypes to allow multiple query-time analyzers per indexed field?
> Currently, to get different query-time analysis behavior you have to
> duplicate a field. This is unfortunate duplication if, for example, I want
> to search a field with query time synonyms on/off. For higher scale search
> cases, allowing multiple query time analyzers against a single index field
> can be invaluable. It's one reason I created the Match Query Parser (
> https://github.com/o19s/match-query-parser) and a major feature of
> hon-lucene-synonyms (https://github.com/healthonnet/hon-lucene-synonyms )
> >
> > What I would propose is the ability to place multiple analyzers under a
> field type. For example:
> >
> > <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
> >     <analyzer type="query" default="true"
> name="with_synonyms">...</analyzer>
> >     <analyzer type="query" name="without_synonyms">...</analyzer>
> >     <analyzer type="index">...</analyzer>
> > </fieldType>
> >
> > Notice how one query-time analyzer is "default" (and including only one
> would make it the default)
> >
> > This would require allowing query parsers pass the analyzer to use at
> query time. I would propose introduce a syntax for configuring query
> behavior per-field in edismax. Omitting this would continue to use the
> default behavior/analyzer.
> >
> > For example, one could query title and text as usual:
> >
> > q=action movies&qf=actor title text&defType=edismax
> >
> > I would propose introducing a syntax whereby qf could refer to a kind of
> psuedo field, configurable with a syntax similar to per-field facet settings
> >
> > For example, below "actor_nosyn" and "actor_syn" actually search the
> same physical field, but are configured with different analyzers
> >
> > q=action movies&qf=actor_syn actor_nosyn^10 title
> text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_nosyn.analyzer=without_synonyms&qf.actor_syn.field=actor&qf.actor_syn.analyzer=with_synonyms
> >
> > Indeed, I would propose extending this syntax to control some of the
> query-specific properties that currently are tied to the fieldType, such as
> >
> > q=action movies&qf=actor_syn actor_nosyn^10 title
> text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_nosyn.analyzer=without_synonyms&qf.actor_syn.field=actor&qf.actor_syn.analyzer=with_synonyms&qf.actorNoSyn.autoGeneratePhraseQueries=false
> >
> > I think this could be a pretty powerful syntax, but would require
> refactoring of the field type and edismax (and possibly other query
> parsers) quite a bit
> >
> > Any thoughts?
> >
> > Best
> > -Doug
> > --
> > Consultant, OpenSource Connections. Contact info at
> http://o19s.com/about-us/doug-turnbull/; Free/Busy (
> http://bit.ly/dougs_cal)
> > --
> > Consultant, OpenSource Connections. Contact info at
> http://o19s.com/about-us/doug-turnbull/; Free/Busy (
> http://bit.ly/dougs_cal)
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> --
Consultant, OpenSource Connections. Contact info at
http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Reply via email to