I think we can expose this information now with a small tweak to the 
SynonymGraphFilter, using the already-existing TypeAttribute.

SGF is hard-coded to set the type attribute to “SYNONYM” on all tokens that it 
inserts into the stream.  It should be simple to add another constructor 
parameter allowing users to change this; then you can chain synonym filters, 
one for each type of expansion you want: synonym, hyponym, hypernym, whatever, 
each setting the type attribute differently.

> On 28 Nov 2018, at 15:59, Michael Gibney <[email protected]> wrote:
> 
> I think the objection to "boosting" in token filters isn't because it
> is "too much", but rather because it breaks the abstraction of the
> analysis chain to directly target scoring (as implied by
> characterizing as "boosting").
> 
> That said, I'm sympathetic to an approach that would establish an
> Attribute to expose the kind of information that would be useful in
> the context of synonyms (or other sorts of derived tokens discussed
> here, where it could be useful to express information about token
> derivation). Such an Attribute would not be directly related to
> scoring/boosting, but would be related to analysis per se, (e.g.,
> source token text, thesaurus, degree of confidence, etc.); support
> could be selectively implemented by TokenFilters, and optionally
> leveraged by query builders (e.g., translated to boosts) or even
> recorded to index Payloads by a final custom analysis component ....
> 
> "You can look at any attribute on the tokenstream you want", "rely on
> abstract attributes (type, ...) then it should be easy to sub-class
> the query builder to access them".  Obviously that works iff analysis
> components record the relevant information in attributes on the
> tokenstream, which I think they currently don't (for much of the
> information that has been discussed here) ... and I know of no
> standard way to express the relevant information on the tokenstream.
> 
> I can see that such an Attribute would be out of place (too
> specialized) in the context of the Attributes in lucene/core; but
> there are lots of more specialized Attributes in the various
> submodules under lucene/analysis/* (SynonymGraphFilter lives in
> analysis-common, FWIW). Again, this doesn't strike me as terribly
> specialized, if one thinks of it more generally as a
> "derivation/relationship" Attribute.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to