Are there any issues if I just do a rename of the word doc from file.doc to
file.txt, then open the file as a text document and parse if for the data I
need?  I know that the Word document format is not in strait ASCII text, but
it appears that the data itself is.

That is TOTALLY wrong, no offense... the Word document format is
actually a structured-storage document composed of a tree of elements
and each element is a list of text snippets (some used, some old
noise) in a nonlinear linked list. If you simply do a "strings" on the
file, you'll end up with a lot of unrelated text in apparently random
order.  Some of that text can even be from another unrelated document,
or prior versions of the document (or template it was horked from).

Seriously, if you want the text from a doc file, use IFilter. If you
need a .Net version just say so.

--
"I am Dyslexic of Borg. Resistors are fertile. Prepare to have your
ass laminated." -- Dan Nitschke

Marc C. Brooks
http://musingmarc.blogspot.com

===================================
This list is hosted by DevelopMentorĀ®  http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Reply via email to