Yes, I was able to extract plain text from OOXML, without any format
code,  is a first step.  But is needed a lot of carefull check of the
tags to get the rigth and of paragraf.

Make this for example at prompt.

7z e -so -y yourdocument.doc > outfile.xml
ex doc2txt.ex outfile.xml

This is the code of doc2txt.ex (requires euphoria interpreter)


include get.e

integer infile
integer char
sequence out = ""
sequence currtag =""
atom IsTag =0


infile = open("outfile.xml", "rb")


while 1 do  -- Loop forever
    char = getc(infile)
    if char=-1 then -- if end of file
        exit        -- end main loop
    else
        if IsTag then
            currtag=currtag&char
            if char='>' then
                IsTag=0
                
            else
                
            end if
        else
            if char = '<' then
                IsTag=1
                currtag=currtag&char
            else
                out= out & char
            end if
        end if
        
    end if

end while

puts(1, out)

if wait_key() then
end if

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Freedos-user mailing list
Freedos-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-user

Reply via email to