Hi All

I want to be able to extract text from visio documents, so I can chuck them into lucene.

So, I've made us of all the handy documentation from vsdump (http://www.gnome.ru/projects/vsdump_en.html), and I've committed some basic code for visio files to the scratchpad, as hdgf.

The code is able to parse the pointers and streams, which seem to be the main building blocks of visio files. It also has a command line tool to print out the streams+pointers, and what their parent-child relationships are.

Annoyingly, I haven't figured out how to get strings out of a strings stream, so I can't actually use it with lucene. Hopefully the vsdump guys will get that cracked shortly, and I can add the functionality in.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to