Hi, The highlighter should indeed highlight it only once since they share the same offsets. Can you provide us with a full curl recreation?
On Sun, Apr 27, 2014 at 2:51 AM, Hieu Nguyen <[email protected]> wrote: > Hello guys, > I have been using the edgeNGram tokenizer to enable partial prefix > matching on a query. However, the tokenizer treats certain characters as > punctuations (e.g. C# => C, I/O => I and O), so I had to add "punctuation" > character class to the edgeNGram tokenizer and use the word_delimiter > filter to drop punctuations. > 'tokenizer': { > > 'prefix_tokenizer': { > 'type': 'edgeNGram', > > 'min_gram': 1, > > 'max_gram': 30, > > 'token_chars': ['letter', 'digit', 'symbol', 'punctuation'], > > }, > } > > 'filter': { > > 'my_word_delimiter': { > 'type': 'word_delimiter', > 'type_table': [ > > '# => ALPHANUM' > > ] > > } > > } > > Unfortunately, this causes the highlight snippets to contain the duplicate > tokens when, for example, the query is "U.S. pol" and the matching document > contains "U.S. politics, as follows: <em>*U*</em><em>*U.S*</em>. > <em>Pol</em>itics (the letter U is highlighted twice). I see how word > delimiter creates the same token for different prefixes ("U" tokens for > "U" and "U.") , but the highlighting seems strange to me because "U" and > "U.S" have the same offset. > > Do you have any suggestions? > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/a88b15ae-bfb8-419b-a58c-f3e7c8556faa%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/a88b15ae-bfb8-419b-a58c-f3e7c8556faa%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7s6J3sUGmtoyDaJCcT%3DNbBxKaoQmtGnr-6V-E9SmuqjA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
