Hi, > Yes, I was able to extract plain text from OOXML, without any format > code, is a first step. But is needed a lot of carefull check of the > tags to get the rigth and of paragraf. > > Make this for example at prompt. > > 7z e -so -y yourdocument.doc > outfile.xml > ex doc2txt.ex outfile.xml > > This is the code of doc2txt.ex (requires euphoria interpreter)
Here is a shorter variant, using SED www.delorie.com/gnu/docs/sed/sed.1.html ftp://ftp.delorie.com/pub/djgpp/current/v2gnu/ sed -e 's/<[^>]*>//g' < outfile.xml > outfile.txt You may need extra expressions for rare cases where there are linebreaks inside tags. You can also add expressions for example to convert & to & later in the doc text etc, just use more -e 's/in/out/g'. Eric ;-) ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Freedos-user mailing list Freedos-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-user