Title: Re: =20 line endings after Remove HTML
The =20 stuff comes from messages using Quoted Printable encoding. I almost never get them any more, but I have a script that fixes them. There can be stuff also like =3D for hyphens, and so on. Dan Crevier wrote it years ago using a scripting addition that didn’t make it to OS X. I added the vanilla AppleScript subroutine to do the decoding. It has a lot of options I’ve turned off so the script does nothing but decode QP and remove extra spaces (it could be much shorter if the unused code were deleted). This should fix all of them:


(*
    Clean message
    Dan Crevier 2/25/98
    <mailto:[EMAIL PROTECTED]>

    A configurable script to clean messages. Run this script with a message window open to
    optionally:
    • Clean subjects
    • Reform paragraphs
    • Fix URLs
    • Remove the forwarding header
    • Remove extra spaces
    • Remove quoting
    • Decode quoted-printable
    • Convert tabs to spaces

    The options are described in more detail below.  To turn the options on and off, change
    the property value to true or false.
    
    You can save multiple copies of the script with different sets of options
    
    Thanks to Guy Kawasaki and David Cortright
*)

(* cleanSubject
  If true, subjects will have extraneous Fw:, (fwd), etc removed
   and leading and trailing spaces will be removed
*)
property cleanSubject : false

(* reformParagraphs
  If true, paragraphs will be reformed. This only works when paragraphs
  are separated by blank lines.
  If removeQuoting is not turned on, it will not look good on quoted lines
*)
property reformParagraphs : false

(* fixURLs
  If true:
    • Adds <>'s to URL's
    • Adds ftp:// to words beginning with ftp. (probable ftp site addresses)
    • Adds http:// to words beginning with www. (probable web site addresses)
    • Adds mailto: to words that contain @'s (probable email addresses)
*)

property fixURLs : false

(* removeForwardHeader
  If true, it removes the text before "---------- Forwarded message ----------"
*)
property removeForwardHeader : false

(* removeExtraSpaces
  If true, it removes extra spaces between words
*)
property removeExtraSpaces : true

(* removeQuoting
  If true, it removes >'s from the beginning of the line
*)
property removeQuoting : false

(* decodeQuotedPrintable
  If true, decodes quoted printable text (such as =3D )
  Requires Dan Crevier's DecodeQP OSAX (included)
*)
property decodeQuotedPrintable : true

(* tabsToSpaces
  If true, converts tabs to 5 spaces
  Note: the spaces may be removed by the removeExtraSpaces, or removeQuoting options
*)
property tabsToSpaces : false

on run {}
    if not CheckMessageWindow() then return
    
    tell application "Microsoft Entourage"
        
        -- ••�clean subject
        if cleanSubject then
            set theSubject to ""
            try
                set theSubject to the subject of front window
            on error
                -- Can't get subject of outgoing message
            end try
            if theSubject = "" then
                try
                    set theSubject to the subject of the displayed message of the front window
                end try
            end if
            if theSubject ≠ "" then -- No subject, forget it!
                
                set theSubject to my SearchReplace(theSubject, " (fwd)", "")
                set theSubject to my SearchReplace(theSubject, "Fw: ", "")
                set theSubject to my SearchReplace(theSubject, "FW: ", "")
                if theSubject does not start with "Re: " then set theSubject to my SearchReplace(theSubject, "Re: ", "")
                repeat while theSubject starts with "Re: Re: "
                    set theSubject to my SearchReplace(theSubject, "Re: Re: ", "Re: ")
                end repeat
                set theSubject to my SearchReplace(theSubject, "Re[2]: ", "")
                set theSubject to my SearchReplace(theSubject, "Fwd[2]:", "")
                
                -- strip leading and trailing spaces
                repeat while theSubject begins with " "
                    set theSubject to text 2 thru -1 of theSubject
                end repeat
                repeat while theSubject ends with " "
                    set theSubject to text 1 thru -2 of theSubject
                end repeat
                
                set the subject of the front window to theSubject
            end if
        end if
        
        -- ••�clean text
        set theText to the content of the front window
        
        -- ••�clean forward header
        if removeForwardHeader then
            -- remove signature
            set headerEnd to offset of "----------" in theText
            if (headerEnd is not 0) and (headerEnd is not greater than ((length of theText) / 2)) then
                set theText to text (headerEnd + 10) thru -1 of theText
                -- remove old headers
                set headerEnd to offset of (return & return) in theText
                if headerEnd is not 0 then
                    set theText to text (headerEnd + 3) thru -1 of theText
                end if
            end if
        end if
        
        -- ••�Decode quoted-printable
        if decodeQuotedPrintable then
            try
                set theText to my decodeqp(theText)
            on error
                display dialog "Quoted-printable could not be decoded because the DecodeQP " & ¬
                    "scripting addition is not installed. See script for details." buttons {"Okay"} ¬
                    with icon stop
            end try
        end if
        
        -- •• tabs to spaces
        if tabsToSpaces then
            set theText to my SearchReplace(theText, tab, "     ")
        end if
        
        -- •• remove quoting / reform paragraphs
        if removeQuoting or reformParagraphs then
            set lastWasText to false
            set theParagraphs to the paragraphs of theText
            set theNewText to ""
            repeat with theLine in theParagraphs
                set lineLength to length of theLine
                set startChar to 1
                if removeQuoting then
                    repeat while startChar ≤ lineLength
                        set theChar to character startChar of theLine
                        if theChar is not " " and theChar is not ">" then
                            exit repeat
                        end if
                        set startChar to startChar + 1
                    end repeat
                end if
                if (startChar ≥ length of theLine) then
                    set theNewText to theNewText & return
                    if reformParagraphs and lastWasText then
                        set lastWasText to false
                        set theNewText to theNewText & return
                    end if
                else
                    set theNewText to theNewText & ((text startChar thru -1 of theLine) as string) & " "
                    if not reformParagraphs then
                        set theNewText to theNewText & return
                    end if
                    set lastWasText to true
                end if
            end repeat
            set theText to theNewText
        end if
        
        -- •• remove extra spaces/fix URLs
        if removeExtraSpaces or fixURLs then
            set theLines to the paragraphs of theText
            
            set oldDelimiter to AppleScript's text item delimiters
            set AppleScript's text item delimiters to space
            
            set newText to ""
            
            repeat with theLine in theLines
                set newLine to ""
                
                set theWords to the text items of theLine
                repeat with theWord in theWords
                    if (length of theWord is greater than 0) or (not removeExtraSpaces) then
                        if fixURLs then
                            set theWord to my ProcessURL(theWord)
                        end if
                        set newLine to newLine & theWord & " "
                    end if
                end repeat
                -- strip trailing space
                if (length of newLine) is greater than 1 then
                    set newLine to (text 1 thru -2 of newLine)
                end if
                set newText to newText & newLine & return
            end repeat
            set theText to newText
        end if
        
        set the content of the front window to theText
    end tell
end run

-- routine to do a search and replace on text
-- routine enhanced by David Cortright
on SearchReplace(mainString, searchString, replaceString)
    set newString to mainString
    set lSearchString to length of searchString
    set lReplaceString to length of replaceString
    set processedOffset to 0
    
    considering case
        repeat
            set lNewString to length of newString
            set unprocessedString to text (processedOffset + 1) thru lNewString of newString
            set foundOffset to (offset of searchString in unprocessedString) + processedOffset
            if foundOffset = processedOffset then exit repeat
            
            if foundOffset is 1 then
                set newString to replaceString & (text (lSearchString + 1) thru lNewString of newString)
                set processedOffset to lReplaceString
                
            else if ((foundOffset + lSearchString - 1) = lNewString) then
                set newString to (text 1 thru (foundOffset - 1) of newString) & replaceString
                exit repeat
                
            else
                set newString to ¬
                    (text 1 thru (foundOffset - 1) of newString) & replaceString & ¬
                    (text (foundOffset + lSearchString) thru lNewString of newString)
                set processedOffset to foundOffset + lReplaceString - 1
            end if
            
            if newString does not contain searchString then exit repeat
        end repeat
        return newString
    end considering
end SearchReplace

-- given a string, it tries to "fix" it, if it's a URL
on ProcessURL(inString)
    if inString ends with "." then
        if length of inString > 1 then
            return ProcessURL(text 1 thru -2 of inString) & "."
        end if
    end if
    
    -- just needs <>
    if inString begins with "ftp:" or inString begins with "http:" or inString begins with "mailto:" then
        return "<" & inString & ">"
    end if
    
    -- check for bad ftp URLs
    if inString begins with "ftp." then
        return "<ftp://" & inString & ">"
    end if
    
    -- check for bad http URLs
    if inString begins with "www." then
        return "<http://" & inString & ">"
    end if
    
    -- check for bad mailto URLs
    if inString is not "@" and inString does not start with "<" and inString contains "@" then
        return "<mailto:" & inString & ">"
    end if
    
    return inString
end ProcessURL

-- make sure a outgoing message window is frontmost
on CheckMessageWindow()
    tell application "Microsoft Entourage"
        set msgClass to the class of the front window
        if msgClass is not draft window and msgClass is not message window then
            display dialog "This script is designed to work with a message window frontmost." buttons {"Okay"} with icon stop
            return false
        end if
    end tell
    return true
end CheckMessageWindow

on decodeqp(aString)
    set HexChars to "0123456789ABCDEF"
    set oldDelims to AppleScript's text item delimiters
    set AppleScript's text item delimiters to {"="}
    set workList to text items of aString
    set returnValue to ""
    set firstOne to true
    repeat with s in workList
        if firstOne then
            set firstOne to false
            set returnValue to s
        else
            if character 1 of s is return then set returnValue to returnValue & s
            try
                set decNum to 0
                set pair to characters 1 thru 2 of s
                if item 1 of pair is not in HexChars or item 2 of pair is not in HexChars then
                    set decNum to 0
                else
                    set decNum to ((offset of (item 1 of pair) in HexChars) - 1) * 16 + ((offset of (item 2 of pair) in HexChars) - 1)
                end if
                if decNum ≠ 0 then
                    set returnValue to returnValue & (ASCII character decNum)
                    if (length of s) ≥ 3 then
                        set returnValue to returnValue & text 3 thru -1 of s
                    else
                        set returnValue to returnValue & s
                    end if
                end if
            on error errMsg number errNum from errFrom partial result errResult to errTo
                -- on error code goes here
                display dialog errMsg & "Number:" & errNum
                return
            end try
            
        end if
    end repeat
    set AppleScript's text item delimiters to oldDelims
    return returnValue
end decodeqp




On or near 12/5/02 10:22 AM, Gary Lists at [EMAIL PROTECTED] observed:

> Allen (since I use your script to remove HTML),
>
> is there an edit I can make to eliminate the =20 line endings on some mail
> after removing the HTML formatting via AS?
>
> Or some other setting/change I can make?
>
> Gary
>
>

--
Microsoft MVP for Entourage/OE/Word (MVPs are volunteers)
Allen Watson <[EMAIL PROTECTED]> Entourage FAQ site: <http://www.entourage.mvps.org/>
AppleScripts for Outlook Express and Entourage:
 <http:[EMAIL PROTECTED]/Scripts/>
Entourage Help Pages: <http://www.entourage.mvps.org/>

Reply via email to