Need some help parsing OneNote files

Nicholas DiPiazza Tue, 03 Dec 2019 04:59:36 -0800

Dear POI team.

I was referred to here by Nick Burch from the Apache Tika dev list.


I am building a OneNote parser for apache tika project and I've hit a brick
wall. I've tried to read the MS-ONE and MS-ONESTORE specs but they are
quite hard to read. What I really need here is an expert in this file
format that can help me understand why I'm missing several key pieces of
information.

The issue is described here at length:
https://stackoverflow.com/questions/59008205/onenote-parsing-how-to-get-to-the-text-blobs-in-the-document

Here is the github link of the project I have going so far:
https://github.com/dropbox/onenote-parser

I appear not to be understanding a key part of parsing the object spaces. I
have my root file nodes and it goes into a hierarchy of child nodes. But I
am not able to get the text from all of my pages.

If there is someone who knows this format, can you run the unit tests and
try to help me find out why I'm missing the elements that contain the rich
text? Perhaps if anyone is available to set up a zoom session with me,
that'd be awesome.

-Nicholas DiPiazza

Need some help parsing OneNote files

Reply via email to