On Wed, 14 Nov 2012, 122jxgcn wrote:
Is there anyone who worked on extracting contents from MS OneNote file? (*.one) It will be great if someone can tell me how to work with parsing OneNote files programatically.

I'm not aware of anything. The good news is that the file format is fully documented:
http://msdn.microsoft.com/en-us/library/dd924743%28v=office.12%29.aspx
http://msdn.microsoft.com/en-us/library/dd951288%28v=office.12%29.aspx

You'll need to use the specification to write some code to read the format, then you can feed it to Tika. My hunch is you're looking at 5-15 days of work.

Apache POI would probably be a good home for most of the OneNote code if you do get it working, please consider contributing it there if you make progress!

Nick

Reply via email to