[ 
https://issues.apache.org/jira/browse/TIKA-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782242#comment-17782242
 ] 

Alexey Pismenskiy commented on TIKA-3828:
-----------------------------------------

Bump and vote for this - seeing this issue too... 

[[email protected]] any chance you can look into it? 

> OneNote Parser - Parsed Files are Missing Parts of the Content
> --------------------------------------------------------------
>
>                 Key: TIKA-3828
>                 URL: https://issues.apache.org/jira/browse/TIKA-3828
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.4.1, 1.28.4
>            Reporter: Gordon Vidal
>            Priority: Major
>         Attachments: TestSection1 (1).one, TikaParserErrorScreenshot.png
>
>
> OneNote files that I receive from Sharepoint Online are currently not parsed 
> correctly. See the attached screenshot and OneNote section file.
> I have been able to consistently reproduce this issue doing the following:
>  * Create a OneNote Document with multiple sections.  
>  * Edit the OneNote Document using the option "Open in Desktop App" and make 
> changes in different sections, saving between edits. I have used both OneNote 
> 2016 (Version 1808) and OneNote 2021 (Version 2108).
>  * Download a section of the OneNote Document using the Sharepoint Online 
> REST API
> I will be investigating this issue myself as well. The Tika codebase is quite 
> new to me so any information about the status of this bug, the potential 
> cause and any plans to fix it would be very welcome. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to