Hi All
I want to be able to extract text from visio documents, so I can chuck
them into lucene.
So, I've made us of all the handy documentation from vsdump
(http://www.gnome.ru/projects/vsdump_en.html), and I've committed some
basic code for visio files to the scratchpad, as hdgf.
The code is able to parse the pointers and streams, which seem to be the
main building blocks of visio files. It also has a command line tool to
print out the streams+pointers, and what their parent-child relationships
are.
Annoyingly, I haven't figured out how to get strings out of a strings
stream, so I can't actually use it with lucene. Hopefully the vsdump guys
will get that cracked shortly, and I can add the functionality in.
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]