Somewhat orthogonal here, but I’ve long thought that it would be useful to introduce named analyzers that could be referenced by name from potentially multiple field types.
-- Steve www.lucidworks.com > On Nov 24, 2017, at 10:17 AM, David Smiley <[email protected]> wrote: > > Doug, > > I think it would be wonderful if a FieldType had N analyzer chains instead of > exactly 3 (index, query, multiTerm). Each chain could simply have a name. > The query parser could be configured to pick a particular chain by name. > > I worked on a search project that had like a half dozen query analyzers, > which were also machine generated in code on the custom FieldType. The query > parser, also custom, could then communicate with the FieldType to get the > particular analyzer that was appropriate for the use. > > It's annoying (hard to maintain) to see repeated chains that are slightly > different. I've wondered if it would be more maintainable to have one chain, > with some qualifier on each element to say to which named chains it applies > to (if not all)? I dunno; trade-offs, trade-offs. > > ~ David > > On Thu, Nov 23, 2017 at 11:03 AM Doug Turnbull > <[email protected]> wrote: > An alternate solution could be to create a fieldType that was a > "FacadeTextField" that searches a real TextField field with a different query > time analyzer. IE it would not have a physical representation in the index, > but just provide a handle to a "field" that is searched with a different > query time analyzer. > > For example, actor_nosyn is really a facade for searching "actor" with a > different analyzer > > <!-- search actor field without synonyms --> > <field name="actor_nosyn" type="text_nosyn" facadeOf="actor"/> > > <!-- searches actor field as normal text field --> > <field name="actor" type="text" indexed="true" stored="true"/> > > > <!-- Facade field type that places a different query time analyzer in front > of another field --> > <fieldType name="text_nosyn" class="solr.FacadeTextField" > > <analyzer type="query" >...</analyzer> > </fieldType> > > <!-- fully fledged text field type --> > <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="query" >...</analyzer> > <analyzer type="index" >...</analyzer> > </fieldType> > > This would allow edismax and other query parsers to remain unchanged > searching, ie: > > q=action movies&qf=actor actor_nosyn title text&defType=edismax > > > > On Thu, Nov 23, 2017 at 10:50 AM Doug Turnbull > <[email protected]> wrote: > I wonder if there's been any thought by the community to refactoring > fieldTypes to allow multiple query-time analyzers per indexed field? > Currently, to get different query-time analysis behavior you have to > duplicate a field. This is unfortunate duplication if, for example, I want to > search a field with query time synonyms on/off. For higher scale search > cases, allowing multiple query time analyzers against a single index field > can be invaluable. It's one reason I created the Match Query Parser > (https://github.com/o19s/match-query-parser) and a major feature of > hon-lucene-synonyms (https://github.com/healthonnet/hon-lucene-synonyms ) > > What I would propose is the ability to place multiple analyzers under a field > type. For example: > > <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="query" default="true" name="with_synonyms">...</analyzer> > <analyzer type="query" name="without_synonyms">...</analyzer> > <analyzer type="index">...</analyzer> > </fieldType> > > Notice how one query-time analyzer is "default" (and including only one would > make it the default) > > This would require allowing query parsers pass the analyzer to use at query > time. I would propose introduce a syntax for configuring query behavior > per-field in edismax. Omitting this would continue to use the default > behavior/analyzer. > > For example, one could query title and text as usual: > > q=action movies&qf=actor title text&defType=edismax > > I would propose introducing a syntax whereby qf could refer to a kind of > psuedo field, configurable with a syntax similar to per-field facet settings > > For example, below "actor_nosyn" and "actor_syn" actually search the same > physical field, but are configured with different analyzers > > q=action movies&qf=actor_syn actor_nosyn^10 title > text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_nosyn.analyzer=without_synonyms&qf.actor_syn.field=actor&qf.actor_syn.analyzer=with_synonyms > > Indeed, I would propose extending this syntax to control some of the > query-specific properties that currently are tied to the fieldType, such as > > q=action movies&qf=actor_syn actor_nosyn^10 title > text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_nosyn.analyzer=without_synonyms&qf.actor_syn.field=actor&qf.actor_syn.analyzer=with_synonyms&qf.actorNoSyn.autoGeneratePhraseQueries=false > > I think this could be a pretty powerful syntax, but would require refactoring > of the field type and edismax (and possibly other query parsers) quite a bit > > Any thoughts? > > Best > -Doug > -- > Consultant, OpenSource Connections. Contact info at > http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal) > -- > Consultant, OpenSource Connections. Contact info at > http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal) > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
