[
https://issues.apache.org/jira/browse/SOLR-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697053#comment-14697053
]
Jan Høydahl commented on SOLR-7926:
-----------------------------------
Hi.
This kind of questions is more suited for the solr-user mailing list. Most
likely this is not a bug. Please ask the question on the list, and also tell
which highlighter implementation you use, with what configuration, and why you
expect it to do what you want (refer to documentation)? I'll close this jira as
"Invalid".
If it ends up being a suspected bug or you find out your wanted result is not
easily configurable with any of the existing highlighter implementations, then
please re-open.
> Hit highlighting with EdgeNGramFilterFactory
> --------------------------------------------
>
> Key: SOLR-7926
> URL: https://issues.apache.org/jira/browse/SOLR-7926
> Project: Solr
> Issue Type: Bug
> Components: highlighter
> Affects Versions: 5.1, 5.2.1
> Environment: CentOS 7 (5.2.1), OS X 10.10.5 (5.1)
> Reporter: Bjørn Hjelle
> Priority: Critical
> Labels: EdgeNGramTokenFilter, highlighting
>
> Hit highlight highlights the whole word, not just the part that matches the
> search term when using EdgeNGramFilterFactory in the field type.
> In schema.xml I have field type text_ngram:
> <fieldType name="text_ngram" class="solr.TextField">
> <analyzer type="index">
> <charFilter
> class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
> <!--tokenizer
> class="solr.StandardTokenizerFactory"/-->
> <filter
> class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory"
> maxGramSize="20" minGramSize="3" luceneMatchVersion="4.3"/>
> <filter
> class="solr.PatternReplaceFilterFactory" pattern="([^\w\d\*æ?~F?~E])"
> replacement="" replace="all"/>
> </analyzer>
> <analyzer type="query">
> <charFilter
> class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer
> class="solr.StandardTokenizerFactory"/>
> <filter
> class="solr.WordDelimiterFilterFactory" generateWordParts="0"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="0"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter
> class="solr.PatternReplaceFilterFactory" pattern="([^\w\d\*æ?~F?~E])"
> replacement="" replace="all"/>
> <filter
> class="solr.PatternReplaceFilterFactory" pattern="^(.{20})(.*)?"
> replacement="$1" replace="all"/>
> </analyzer>
> </fieldType>
> In Solr Admin analyse, with index value "lucene" and query value "luc" it
> shows this:
> LENGTF text luc luce lucen lucene
> raw_bytes [6c 75 63] [6c 75 63 65] [6c 75 63 65 6e] [6c
> 75 63 65 6e 65]
> start 0 0 0 0
> end 6 6 6 6
> positionLength 1 1 1 1
> type word word word word
> position 1 1 1 1
> Since the end position is 6 in this case the whole word ("lucene" is
> highlighted).
>
> If I change to use NGramFilterFactory it shows me this (for the first three
> items):
> LENGTF text luc uce cen
> raw_bytes [6c 75 63] [6c 75 63 65] [6c 75 63 65 6e]
> start 0 1 2
> end 3 4 5
> positionLength 1 1 1
> type word word word
> position 1 1 1
> The end position is correct then and the highlighter highlights only the
> search term. Note that I have specified luceneMatchVersion="4.3". Without
> this, the end positions goes back to 6 also for the NGramFilterFactory.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]