I like that idea Alan. The trick is for QueryBuilder's 'newSynonymQuery' to
be useful in that context, you need to pass terms with metadata down to the
subclass. This is what I started working on a few weeks ago:

https://github.com/o19s/lucene-solr/commit/0fc3930671ef002cfbb5e3d52b6f8edc3715bf14

I don't think it's as simple as overriding analyzeBoolean/analyzeMultiBoolean
as Rob suggests, as there's also analyzeGraphBoolean and the  that would
also need to collect this metadata. I wouldn't want to copy paste all this
code into a subclass just to add one token attribute.

-Doug



On Wed, Nov 28, 2018 at 12:25 PM Alan Woodward <[email protected]> wrote:

> I think we can expose this information now with a small tweak to the
> SynonymGraphFilter, using the already-existing TypeAttribute.
>
> SGF is hard-coded to set the type attribute to “SYNONYM” on all tokens
> that it inserts into the stream.  It should be simple to add another
> constructor parameter allowing users to change this; then you can chain
> synonym filters, one for each type of expansion you want: synonym, hyponym,
> hypernym, whatever, each setting the type attribute differently.
>
> > On 28 Nov 2018, at 15:59, Michael Gibney <[email protected]>
> wrote:
> >
> > I think the objection to "boosting" in token filters isn't because it
> > is "too much", but rather because it breaks the abstraction of the
> > analysis chain to directly target scoring (as implied by
> > characterizing as "boosting").
> >
> > That said, I'm sympathetic to an approach that would establish an
> > Attribute to expose the kind of information that would be useful in
> > the context of synonyms (or other sorts of derived tokens discussed
> > here, where it could be useful to express information about token
> > derivation). Such an Attribute would not be directly related to
> > scoring/boosting, but would be related to analysis per se, (e.g.,
> > source token text, thesaurus, degree of confidence, etc.); support
> > could be selectively implemented by TokenFilters, and optionally
> > leveraged by query builders (e.g., translated to boosts) or even
> > recorded to index Payloads by a final custom analysis component ....
> >
> > "You can look at any attribute on the tokenstream you want", "rely on
> > abstract attributes (type, ...) then it should be easy to sub-class
> > the query builder to access them".  Obviously that works iff analysis
> > components record the relevant information in attributes on the
> > tokenstream, which I think they currently don't (for much of the
> > information that has been discussed here) ... and I know of no
> > standard way to express the relevant information on the tokenstream.
> >
> > I can see that such an Attribute would be out of place (too
> > specialized) in the context of the Attributes in lucene/core; but
> > there are lots of more specialized Attributes in the various
> > submodules under lucene/analysis/* (SynonymGraphFilter lives in
> > analysis-common, FWIW). Again, this doesn't strike me as terribly
> > specialized, if one thinks of it more generally as a
> > "derivation/relationship" Attribute.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> --
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Reply via email to