correction - my project is actually here: https://github.com/nddipiazza/onenote-parser-java i started out by porting some source code over from this c++ project: https://github.com/dropbox/onenote-parser
On Tue, Dec 3, 2019 at 6:58 AM Nicholas DiPiazza < nicholas.dipia...@gmail.com> wrote: > Dear POI team. > > I was referred to here by Nick Burch from the Apache Tika dev list. > > I am building a OneNote parser for apache tika project and I've hit a > brick wall. I've tried to read the MS-ONE and MS-ONESTORE specs but they > are quite hard to read. What I really need here is an expert in this file > format that can help me understand why I'm missing several key pieces of > information. > > The issue is described here at length: > https://stackoverflow.com/questions/59008205/onenote-parsing-how-to-get-to-the-text-blobs-in-the-document > > Here is the github link of the project I have going so far: > https://github.com/dropbox/onenote-parser > > I appear not to be understanding a key part of parsing the object spaces. > I have my root file nodes and it goes into a hierarchy of child nodes. But > I am not able to get the text from all of my pages. > > If there is someone who knows this format, can you run the unit tests and > try to help me find out why I'm missing the elements that contain the rich > text? Perhaps if anyone is available to set up a zoom session with me, > that'd be awesome. > > -Nicholas DiPiazza >