[
https://issues.apache.org/jira/browse/TIKA-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782242#comment-17782242
]
Alexey Pismenskiy commented on TIKA-3828:
-----------------------------------------
Bump and vote for this - seeing this issue too...
[[email protected]] any chance you can look into it?
> OneNote Parser - Parsed Files are Missing Parts of the Content
> --------------------------------------------------------------
>
> Key: TIKA-3828
> URL: https://issues.apache.org/jira/browse/TIKA-3828
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.4.1, 1.28.4
> Reporter: Gordon Vidal
> Priority: Major
> Attachments: TestSection1 (1).one, TikaParserErrorScreenshot.png
>
>
> OneNote files that I receive from Sharepoint Online are currently not parsed
> correctly. See the attached screenshot and OneNote section file.
> I have been able to consistently reproduce this issue doing the following:
> * Create a OneNote Document with multiple sections.
> * Edit the OneNote Document using the option "Open in Desktop App" and make
> changes in different sections, saving between edits. I have used both OneNote
> 2016 (Version 1808) and OneNote 2021 (Version 2108).
> * Download a section of the OneNote Document using the Sharepoint Online
> REST API
> I will be investigating this issue myself as well. The Tika codebase is quite
> new to me so any information about the status of this bug, the potential
> cause and any plans to fix it would be very welcome.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)