nddipiazza opened a new pull request #461:
URL: https://github.com/apache/tika/pull/461


   # Support parsing OneNote files when downloaded from O365
   
   Previous version of Tika OneNote parser was not able to handle files saved 
from Office 365 (SharePoint Online, OneDrive).
   
   See section 2.8 of this document
   
https://interoperability.blob.core.windows.net/files/MS-ONESTORE/%5bMS-ONESTORE%5d.pdf
   
   which describes that MS-ONESTORE documents can be encoded by the following 
spec: 
   
https://interoperability.blob.core.windows.net/files/MS-FSSHTTPB/%5bMS-FSSHTTPB%5d.pdf
   
   Now those getting files from the O365 suite will be able to use the OneNote 
parser. 
   
   # Things to improve later
   
   * Stream instead of use byte array?
   * See if we can use this newer parser code for the on-prem documents too to 
avoid the code bloat?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to