1. remove all CR and LF characters.
2 remove all </p>
3 change all <p> to CR/LF
4 change all <br> to CR/LF

While I recognize this is a "first stab" heuristic, it fails because of too many assumptions.
For line endings:
Windows/DOS use CR/LF
Unix/Linux/Mac OS X use LF
Mac classic uses CR


What is worse, many email servers "munge" line endings as they store/forward messages.

Also HTML should be _parsed_ and not just willy-nilly remove </p> info. An emerging
requirement for HTML is that ALL tags be paired win an <TAG ON> </TAG OFF>.
Parsing is still required if the HTML is malformed.



-- You take your life in your own hands, and what happens? A terrible thing: no one to blame. -- Erica Jong, writer (1942- )


-- You take your life in your own hands, and what happens? A terrible thing: no one to blame. -- Erica Jong, writer (1942- )

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to