Mark Elston asked on comp.emacs for something to ASCIIify text he cut-and-pasted from the web. Here's a little thing I use for just that job:
;;; cleanup.el --- Clean up quoted material of a few common types ;; Copyright (C) 2005 by Tom Breton <[EMAIL PROTECTED]> ;; Author: Tom Breton <[EMAIL PROTECTED]> ;; Keywords: convenience ;; This file is free software; you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation; either version 2, or (at your option) ;; any later version. ;; This file is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs; see the file COPYING. If not, write to ;; the Free Software Foundation, Inc., 59 Temple Place - Suite 330, ;; Boston, MA 02111-1307, USA. ;;; Commentary: ;; Rationale: Do you sometimes cut and paste from web pages into ;; Emacs? Sometimes the result isn't ASCII text, but an ugly ;; near-ASCII text that can't be saved with plain encoding. ;; Solution: Call `tehom-cleanup-quoted' for HTML that used entities ;; for characters (eg, "“"). Or `tehom-cleanup-a-hat' for the ;; encoding where each escape sequence starts with the circumflexed a ;; (looks like "a" with a hat on it). ;; There's also `tehom-cleanup-google' that cleans up text that came ;; from Google groups. ;; If none of those apply, it's easy to define another translation: ;; Define a list where each element is two strings, (OLD-TEXT ;; NEW-TEXT). If OLD-TEXT is not plain text, call ;; `tehom-escape-unprintables' to translate it to plain text. Define ;; a call `tehom-cleanup-quoted-aux' with that list as an argument - ;; and there's your translator. ;; These translation lists are just compilations of the common escape ;; sequences that I found I needed to fix, so they can certainly be ;; extended. ;;; Code: (defconst tehom-cleanup-html-equivs (cons (list (char-to-string 2217) "(c)") '( ( "“" "``") ( "”" "''") ( "‘" "`") ( "’" "'") ( "…" "...") ("–" "-") ("—" " -- ") )) "HTML entity ASCII equivalents") (defconst tehom-cleanup-google-removals '( ( "- Hide quoted text - - Show quoted text - " "") ( "Reply | Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse" "") ( "Reply | Reply to Author | Forward | Print | View Thread | Show original | Report Abuse " "") ( " - Find messages by this author" "") ) "Google groups sequences that you probably want to get rid of" ) (defvar tehom-cleanup-ahat-equivs '( ("\x8e2??" "'") ("\x8e2?~" "`") ("\x8e2?\x8a6" "...") ("\x8e2\x3f\x8a2" "*>") ) "A-hat ASCII equivalents" ) ;;Maintenance helper. (defun tehom-escape-unprintables (str) "Insert STR, all escaped. For translating new sequences." (dotimes (i (length str)) (insert (format "\\x%x" (aref str i))))) ;;;###autoload (defun tehom-cleanup-quoted () "Clean up quoted HTML PCDATA" (interactive) (tehom-cleanup-quoted-aux tehom-cleanup-html-equivs)) ;;;###autoload (defun tehom-cleanup-google () "Clean up captures from new Google news" (interactive) (tehom-cleanup-quoted-aux tehom-cleanup-google-removals)) ;;;###autoload (defun tehom-cleanup-a-hat () "Clean up captures from new Google news" (interactive) (tehom-cleanup-quoted-aux tehom-cleanup-ahat-equivs)) (defun tehom-cleanup-quoted-aux (replacements) "Cleanup quoted HTML PCDATA" (interactive) (require 'cl) (dolist (i replacements) (goto-char (point-min)) (replace-string (first i) (second i))) (goto-char (point-min)) (delete-blank-lines) (delete-blank-lines)) ;;; cleanup.el ends here -- Tom Breton, the calm-eyed visionary _______________________________________________ Gnu-emacs-sources mailing list Gnu-emacs-sources@gnu.org http://lists.gnu.org/mailman/listinfo/gnu-emacs-sources