[ https://issues.apache.org/jira/browse/LUCENE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487770#comment-13487770 ]
Oliver Christ commented on LUCENE-4518: --------------------------------------- In a classic FST, once you consumed the input and start looking for completions, you'd know how many input symbols you consumed, and how many output symbols you've collected so far, and how many symbols the TopNSearcher appends (i.e. how long, in characters, the completed portion of the string is). That information should be sufficient to explicitly distinguish the two parts. As long as completions don't "surround" the user-entered portions (google: "sox ticket purc"), or the prefix for some reason ends in the "middle" of a UTF8 byte sequence, this may be sufficient to cover basic use cases and put the length of the covered prefix (or completed suffix) into each LookupResult. I'm assuming that the input and output symbols are "reasonably aligned" in the transition labels, which may not be the case in the current implementation (I haven't gotten to that level of detail yet :-( ). > Suggesters: highlighting (explicit markup of user-typed portions vs. > generated portions in a suggestion) > -------------------------------------------------------------------------------------------------------- > > Key: LUCENE-4518 > URL: https://issues.apache.org/jira/browse/LUCENE-4518 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Oliver Christ > > As a user, I would like the lookup result of the suggestion engine to contain > information which allows me to distinguish the user-entered portion from the > autocompleted portion of a suggestion. That information can then be used for > e.g. highlighting. > *Notes:* > It's trivial if the suggestion engine only applies simple prefix search, as > then the user-typed prefix is always a true prefix of the completion. > However, it's non-trivial as soon as you use an AnalyzingSuggester, where the > completion may (in extreme cases) be quite different from the user-provided > input. As soon as case/diacritics folding, script adaptation (kanji/hiragana) > come into play, the completion is no longer guaranteed to be an extension of > the query. Since the caller of the suggestion engine (UI) generally does not > know the implementation details, the required information needs to be passed > in the LookupResult. > *Discussion on java-user:* > > I haven't found a simple solution for the highlighting yet, > > particularly when using AnalyzingSuggester (where it's non-trivial). > Mike McCandless: > Ahh I see ... it is challenging in that case. Hmm. Maybe open an issue for > this as well, so we can discuss/iterate? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org