On Thu, 06 Nov 2008 10:43:38 -0600 "Robert D. Crawford" <[EMAIL PROTECTED]> wrote:
RDC> Score files are great. Truth be told, I'm just looking for what works. RDC> I like your solution but it will exclude posts with unicode characters, RDC> which is something I would like to avoid if possible. OK, so the question now is "how to tell if a character is in the Asian Unicode character ranges." Unfortunately I recall Emacs' own character database will misrepresent some Latin characters, so I wouldn't depend on character properties. I looked at ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt and picked the blocks that looked useful. (defun zme () (let ((data " 0D00..0D7F; Malayalam 0D80..0DFF; Sinhala 0E00..0E7F; Thai 0E80..0EFF; Lao 0F00..0FFF; Tibetan 1000..109F; Myanmar 1780..17FF; Khmer 1800..18AF; Mongolian 1900..194F; Limbu 1950..197F; Tai Le 1980..19DF; New Tai Lue 19E0..19FF; Khmer Symbols 1A00..1A1F; Buginese 1B00..1B7F; Balinese 2E80..2EFF; CJK Radicals Supplement 2F00..2FDF; Kangxi Radicals 2FF0..2FFF; Ideographic Description Characters 3000..303F; CJK Symbols and Punctuation 3040..309F; Hiragana 30A0..30FF; Katakana 3100..312F; Bopomofo 3130..318F; Hangul Compatibility Jamo 3190..319F; Kanbun 31A0..31BF; Bopomofo Extended 31C0..31EF; CJK Strokes 31F0..31FF; Katakana Phonetic Extensions 3200..32FF; Enclosed CJK Letters and Months 3300..33FF; CJK Compatibility 3400..4DBF; CJK Unified Ideographs Extension A 4DC0..4DFF; Yijing Hexagram Symbols 4E00..9FFF; CJK Unified Ideographs A000..A48F; Yi Syllables A490..A4CF; Yi Radicals AC00..D7AF; Hangul Syllables F900..FAFF; CJK Compatibility Ideographs") (out "")) (dolist (line (split-string data "\n")) (dolist (item (split-string line ";")) (when (string-match "\\([0-9A-F]+\\)\\.\\.\\([0-9A-F]+\\)" item) (setq out (concat out (format "\\u%s-\\u%s" (match-string 1 item) (match-string 2 item) )))))) (concat "[^" out "]"))) Evaluating this (you have to load the 'cl library too) gives "[^\\u0D00-\\u0D7F\\u0D80-\\u0DFF\\u0E00-\\u0E7F\\u0E80-\\u0EFF\\u0F00-\\u0FFF\\u1000-\\u109F\\u1780-\\u17FF\\u1800-\\u18AF\\u1900-\\u194F\\u1950-\\u197F\\u1980-\\u19DF\\u19E0-\\u19FF\\u1A00-\\u1A1F\\u1B00-\\u1B7F\\u2E80-\\u2EFF\\u2F00-\\u2FDF\\u2FF0-\\u2FFF\\u3000-\\u303F\\u3040-\\u309F\\u30A0-\\u30FF\\u3100-\\u312F\\u3130-\\u318F\\u3190-\\u319F\\u31A0-\\u31BF\\u31C0-\\u31EF\\u31F0-\\u31FF\\u3200-\\u32FF\\u3300-\\u33FF\\u3400-\\u4DBF\\u4DC0-\\u4DFF\\u4E00-\\u9FFF\\uA000-\\uA48F\\uA490-\\uA4CF\\uAC00-\\uD7AF\\uF900-\\uFAFF]" I don't know if this is good enough for you, but the ranges are correct at least and you see how you can add more. I tested with a few characters like this: (string-match (zme) "helloà´€") and it seems to work OK. In a score file you'll have only one backslash but otherwise it should work. Ted _______________________________________________ info-gnus-english mailing list [email protected] http://lists.gnu.org/mailman/listinfo/info-gnus-english
