> Yes, I was able to extract plain text from OOXML, without any format
> code,  is a first step.  But is needed a lot of carefull check of the
> tags to get the rigth and of paragraf.
> Make this for example at prompt.
> 7z e -so -y yourdocument.doc > outfile.xml
> ex doc2txt.ex outfile.xml
> This is the code of doc2txt.ex (requires euphoria interpreter)

Here is a shorter variant, using SED


sed -e 's/<[^>]*>//g' < outfile.xml > outfile.txt

You may need extra expressions for rare cases where
there are linebreaks inside tags. You can also add
expressions for example to convert &amp; to & later
in the doc text etc, just use more -e 's/in/out/g'.

Eric ;-)

This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
Freedos-user mailing list

Reply via email to