[
https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453780#comment-17453780
]
ASF GitHub Bot commented on TIKA-3446:
--------------------------------------
nddipiazza opened a new pull request #460:
URL: https://github.com/apache/tika/pull/460
# Support parsing OneNote files when downloaded from SharePoint Online
See section 2.8 of this document
https://interoperability.blob.core.windows.net/files/MS-ONESTORE/%5bMS-ONESTORE%5d.pdf
which describes that MS-ONESTORE documents can be encoded by the following
spec:
https://interoperability.blob.core.windows.net/files/MS-FSSHTTPB/%5bMS-FSSHTTPB%5d.pdf
Problem was, our Tika OneNote parser was kind of not super useful because it
would only work for files saved from OnPrem version of OneNote.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> OneNote - look into adding support for OneNote 365 documents
> ------------------------------------------------------------
>
> Key: TIKA-3446
> URL: https://issues.apache.org/jira/browse/TIKA-3446
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Affects Versions: 1.27
> Reporter: Nicholas DiPiazza
> Assignee: Nicholas DiPiazza
> Priority: Major
>
> While doing some parsing of OneNote documents, I was investigating a slew of
> them that did not seem to parse very well.
> When I did some digging, I found out that these documents were generated from
> SharePoint Online.
> I had hoped that OneNote documents generated from SharePoint Online would
> just be the same as OnPrem OneNote documents from 2016, 2019 etc.
> But turns out this is NOT the case.
> I checked out the Microsoft specification MS-ONESTORE and found that the
> documents do not match the specifications that are published.
> Opened a community post: [Looking for the MS spec for OneNote 365 version -
> Microsoft
> Q&A|https://docs.microsoft.com/en-us/answers/questions/436943/looking-for-the-ms-spec-for-onenote-365-version-1.html]
> And also opened an internal ticket with Microsoft.
> They will be responding soon with an analysis of my issue and we'll see if
> there is anything we can do.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)