Tim Allison created TIKA-4376:
---------------------------------
Summary: tika-eval should tokenize on non-breaking/narrow/other
space variants
Key: TIKA-4376
URL: https://issues.apache.org/jira/browse/TIKA-4376
Project: Tika
Issue Type: Task
Components: tika-eval
Reporter: Tim Allison
See TIKA-4375. Many thanks to [~tilman] for identifying this issue and
supplying this link:
[https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)