Yes, I was able to extract plain text from OOXML, without any format code, is a first step. But is needed a lot of carefull check of the tags to get the rigth and of paragraf.
Make this for example at prompt. 7z e -so -y yourdocument.doc > outfile.xml ex doc2txt.ex outfile.xml This is the code of doc2txt.ex (requires euphoria interpreter) include get.e integer infile integer char sequence out = "" sequence currtag ="" atom IsTag =0 infile = open("outfile.xml", "rb") while 1 do -- Loop forever char = getc(infile) if char=-1 then -- if end of file exit -- end main loop else if IsTag then currtag=currtag&char if char='>' then IsTag=0 else end if else if char = '<' then IsTag=1 currtag=currtag&char else out= out & char end if end if end if end while puts(1, out) if wait_key() then end if ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Freedos-user mailing list Freedos-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-user