Hi all,

We are using the UAX29URLEmailTokenizer so we can use the token types in our 
plugins.


However, I noticed that the tokenizer is not detecting certain URLs as <URL> 
but <ALPHANUM> instead.


Examples that are not working:


example.com is <ALPHANUM>

example.net is <ALPHANUM>


But:

https://example.com is <URL> as is https://example.net.


Examples that work:

example.ch is <URL>

example.co.uk is <URL>

example.nl is <URL>


I have checked the JIRA, and could not find an issue. I have tested this on 
Lucene (Solr) 6.4.1 and 7.3.


Could someone confirm my findings and advise what I could do to (help) resolve 
this issue?


/JZ

<https://example.com>


Reply via email to