nddipiazza opened a new pull request #461: URL: https://github.com/apache/tika/pull/461
# Support parsing OneNote files when downloaded from O365 Previous version of Tika OneNote parser was not able to handle files saved from Office 365 (SharePoint Online, OneDrive). See section 2.8 of this document https://interoperability.blob.core.windows.net/files/MS-ONESTORE/%5bMS-ONESTORE%5d.pdf which describes that MS-ONESTORE documents can be encoded by the following spec: https://interoperability.blob.core.windows.net/files/MS-FSSHTTPB/%5bMS-FSSHTTPB%5d.pdf Now those getting files from the O365 suite will be able to use the OneNote parser. # Things to improve later * Stream instead of use byte array? * See if we can use this newer parser code for the on-prem documents too to avoid the code bloat? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
