Somewhat orthogonal here, but I’ve long thought that it would be useful to 
introduce named analyzers that could be referenced by name from potentially 
multiple field types.

--
Steve
www.lucidworks.com

> On Nov 24, 2017, at 10:17 AM, David Smiley <[email protected]> wrote:
> 
> Doug,
> 
> I think it would be wonderful if a FieldType had N analyzer chains instead of 
> exactly 3 (index, query, multiTerm).  Each chain could simply have a name.  
> The query parser could be configured to pick a particular chain by name.
> 
> I worked on a search project that had like a half dozen query analyzers, 
> which were also machine generated in code on the custom FieldType.  The query 
> parser, also custom, could then communicate with the FieldType to get the 
> particular analyzer that was appropriate for the use.
> 
> It's annoying (hard to maintain) to see repeated chains that are slightly 
> different.  I've wondered if it would be more maintainable to have one chain, 
> with some qualifier on each element to say to which named chains it applies 
> to (if not all)?  I dunno; trade-offs, trade-offs.
> 
> ~ David
> 
> On Thu, Nov 23, 2017 at 11:03 AM Doug Turnbull 
> <[email protected]> wrote:
> An alternate solution could be to create a fieldType that was a 
> "FacadeTextField" that searches a real TextField field with a different query 
> time analyzer. IE it would not have a physical representation in the index, 
> but just provide a handle to a "field" that is searched with a different 
> query time analyzer.
> 
> For example, actor_nosyn is really a facade for searching "actor" with a 
> different analyzer
> 
> <!-- search actor field without synonyms -->
>   <field name="actor_nosyn" type="text_nosyn" facadeOf="actor"/>
> 
> <!-- searches actor field as normal text field -->
>   <field name="actor" type="text" indexed="true" stored="true"/>
> 
> 
> <!-- Facade field type that places a different query time analyzer in front 
> of another field -->
> <fieldType name="text_nosyn" class="solr.FacadeTextField" >
>     <analyzer type="query" >...</analyzer>
> </fieldType>
> 
> <!-- fully fledged text field type -->
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>     <analyzer type="query" >...</analyzer>
>     <analyzer type="index" >...</analyzer>
> </fieldType>
> 
> This would allow edismax and other query parsers to remain unchanged 
> searching, ie:
> 
> q=action movies&qf=actor actor_nosyn title text&defType=edismax
> 
> 
> 
> On Thu, Nov 23, 2017 at 10:50 AM Doug Turnbull 
> <[email protected]> wrote:
> I wonder if there's been any thought by the community to refactoring 
> fieldTypes to allow multiple query-time analyzers per indexed field? 
> Currently, to get different query-time analysis behavior you have to 
> duplicate a field. This is unfortunate duplication if, for example, I want to 
> search a field with query time synonyms on/off. For higher scale search 
> cases, allowing multiple query time analyzers against a single index field 
> can be invaluable. It's one reason I created the Match Query Parser 
> (https://github.com/o19s/match-query-parser) and a major feature of 
> hon-lucene-synonyms (https://github.com/healthonnet/hon-lucene-synonyms )
> 
> What I would propose is the ability to place multiple analyzers under a field 
> type. For example:
> 
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>     <analyzer type="query" default="true" name="with_synonyms">...</analyzer>
>     <analyzer type="query" name="without_synonyms">...</analyzer>
>     <analyzer type="index">...</analyzer>
> </fieldType>
> 
> Notice how one query-time analyzer is "default" (and including only one would 
> make it the default)
> 
> This would require allowing query parsers pass the analyzer to use at query 
> time. I would propose introduce a syntax for configuring query behavior 
> per-field in edismax. Omitting this would continue to use the default 
> behavior/analyzer.
> 
> For example, one could query title and text as usual:
> 
> q=action movies&qf=actor title text&defType=edismax
> 
> I would propose introducing a syntax whereby qf could refer to a kind of 
> psuedo field, configurable with a syntax similar to per-field facet settings
> 
> For example, below "actor_nosyn" and "actor_syn" actually search the same 
> physical field, but are configured with different analyzers
> 
> q=action movies&qf=actor_syn actor_nosyn^10 title 
> text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_nosyn.analyzer=without_synonyms&qf.actor_syn.field=actor&qf.actor_syn.analyzer=with_synonyms
> 
> Indeed, I would propose extending this syntax to control some of the 
> query-specific properties that currently are tied to the fieldType, such as
> 
> q=action movies&qf=actor_syn actor_nosyn^10 title 
> text&defType=edismax&qf.actor_nosyn.field=actor&qf.actor_nosyn.analyzer=without_synonyms&qf.actor_syn.field=actor&qf.actor_syn.analyzer=with_synonyms&qf.actorNoSyn.autoGeneratePhraseQueries=false
> 
> I think this could be a pretty powerful syntax, but would require refactoring 
> of the field type and edismax (and possibly other query parsers) quite a bit
> 
> Any thoughts?
> 
> Best
> -Doug
> -- 
> Consultant, OpenSource Connections. Contact info at 
> http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)
> -- 
> Consultant, OpenSource Connections. Contact info at 
> http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)
> -- 
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
> http://www.solrenterprisesearchserver.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to