luman created TIKA-4368: --------------------------- Summary: Unable to correctly extract content in OneNote Key: TIKA-4368 URL: https://issues.apache.org/jira/browse/TIKA-4368 Project: Tika Issue Type: Bug Components: parser Affects Versions: 3.0.0, 4.0.0 Reporter: luman Attachments: Multilingual.one, Onenote-Screenshot.jpg, Tika-gui-Screenshot.jpg
# Non-rich text content is not checked for the latest version, so when the content is TextExtendedAscii, it is still parsed repeatedly. # Time parsing does not detect the version and may extract repeatedly. # Dates are not parsed. # non-Ascii characters unable to correctly extract parsed. ## Garbled text ## No parsing performed The attachments include the original OneNote file, a screenshot of OneNote app, and a screenshot of TikaGUI app. -- This message was sent by Atlassian Jira (v8.20.10#820010)