Hi,

The highlighter should indeed highlight it only once since they share the
same offsets. Can you provide us with a full curl recreation?


On Sun, Apr 27, 2014 at 2:51 AM, Hieu Nguyen <[email protected]> wrote:

> Hello guys,
> I have been using the edgeNGram tokenizer to enable partial prefix
> matching on a query. However, the tokenizer treats certain characters as
> punctuations (e.g. C# => C, I/O => I and O), so I had to add "punctuation"
> character class to the edgeNGram tokenizer and use the word_delimiter
> filter to drop punctuations.
>    'tokenizer': {
>
>        'prefix_tokenizer': {
>            'type': 'edgeNGram',
>
>            'min_gram': 1,
>
>            'max_gram': 30,
>
>            'token_chars': ['letter', 'digit', 'symbol', 'punctuation'],
>
>         },
>     }
>
>     'filter': {
>
>         'my_word_delimiter': {
>             'type': 'word_delimiter',
>                 'type_table': [
>
>                     '# => ALPHANUM'
>
>                  ]
>
>              }
>
>      }
>
> Unfortunately, this causes the highlight snippets to contain the duplicate
> tokens when, for example, the query is "U.S. pol" and the matching document
> contains "U.S. politics, as follows: <em>*U*</em><em>*U.S*</em>.
> <em>Pol</em>itics (the letter U is highlighted twice). I see how word
> delimiter creates the same token for different prefixes ("U" tokens for
> "U" and "U.") , but the highlighting seems strange to me because "U" and
> "U.S" have the same offset.
>
> Do you have any suggestions?
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a88b15ae-bfb8-419b-a58c-f3e7c8556faa%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/a88b15ae-bfb8-419b-a58c-f3e7c8556faa%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7s6J3sUGmtoyDaJCcT%3DNbBxKaoQmtGnr-6V-E9SmuqjA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to