luman created TIKA-4368:
---------------------------

             Summary: Unable to correctly extract content in OneNote
                 Key: TIKA-4368
                 URL: https://issues.apache.org/jira/browse/TIKA-4368
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 3.0.0, 4.0.0
            Reporter: luman
         Attachments: Multilingual.one, Onenote-Screenshot.jpg, 
Tika-gui-Screenshot.jpg

# Non-rich text content is not checked for the latest version, so when the 
content is TextExtendedAscii, it is still parsed repeatedly.
 # Time parsing does not detect the version and may extract repeatedly.

 # Dates are not parsed.
 # non-Ascii characters unable to correctly extract parsed.
 ## Garbled text

 ## No parsing performed

The attachments include the original OneNote file, a screenshot of OneNote app, 
and a screenshot of TikaGUI app. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to