Re: Need some help parsing OneNote files

Nicholas DiPiazza Tue, 03 Dec 2019 05:00:37 -0800

correction - my project is actually here:
https://github.com/nddipiazza/onenote-parser-java
i started out by porting some source code over from this c++ project:
https://github.com/dropbox/onenote-parser


On Tue, Dec 3, 2019 at 6:58 AM Nicholas DiPiazza <
[email protected]> wrote:

> Dear POI team.
>
> I was referred to here by Nick Burch from the Apache Tika dev list.
>
> I am building a OneNote parser for apache tika project and I've hit a
> brick wall. I've tried to read the MS-ONE and MS-ONESTORE specs but they
> are quite hard to read. What I really need here is an expert in this file
> format that can help me understand why I'm missing several key pieces of
> information.
>
> The issue is described here at length:
> https://stackoverflow.com/questions/59008205/onenote-parsing-how-to-get-to-the-text-blobs-in-the-document
>
> Here is the github link of the project I have going so far:
> https://github.com/dropbox/onenote-parser
>
> I appear not to be understanding a key part of parsing the object spaces.
> I have my root file nodes and it goes into a hierarchy of child nodes. But
> I am not able to get the text from all of my pages.
>
> If there is someone who knows this format, can you run the unit tests and
> try to help me find out why I'm missing the elements that contain the rich
> text? Perhaps if anyone is available to set up a zoom session with me,
> that'd be awesome.
>
> -Nicholas DiPiazza
>

Re: Need some help parsing OneNote files

Reply via email to