On Wed, 14 Nov 2012, 122jxgcn wrote:
Is there anyone who worked on extracting contents from MS OneNote file?
(*.one) It will be great if someone can tell me how to work with parsing
OneNote files programatically.
I'm not aware of anything. The good news is that the file format is fully
documented:
http://msdn.microsoft.com/en-us/library/dd924743%28v=office.12%29.aspx
http://msdn.microsoft.com/en-us/library/dd951288%28v=office.12%29.aspx
You'll need to use the specification to write some code to read the
format, then you can feed it to Tika. My hunch is you're looking at 5-15
days of work.
Apache POI would probably be a good home for most of the OneNote code if
you do get it working, please consider contributing it there if you make
progress!
Nick