luman created TIKA-4368:
---------------------------
Summary: Unable to correctly extract content in OneNote
Key: TIKA-4368
URL: https://issues.apache.org/jira/browse/TIKA-4368
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 3.0.0, 4.0.0
Reporter: luman
Attachments: Multilingual.one, Onenote-Screenshot.jpg,
Tika-gui-Screenshot.jpg
# Non-rich text content is not checked for the latest version, so when the
content is TextExtendedAscii, it is still parsed repeatedly.
# Time parsing does not detect the version and may extract repeatedly.
# Dates are not parsed.
# non-Ascii characters unable to correctly extract parsed.
## Garbled text
## No parsing performed
The attachments include the original OneNote file, a screenshot of OneNote app,
and a screenshot of TikaGUI app.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)