Michal Nazarewicz <[EMAIL PROTECTED]> writes: > "<<" and ">>" have codes U+00AB and U+00BB so that's why they match but > there are plenty of other characters which may show up in an English > text, like (I'll use a (sequence of) ASCII characters which resembles > the proper unicode character) "`" (U+2018), "'" (U+2019), "``" (U+201C) > , "''" (U+201D) or "..." (U+2026) which will cause the entry to be > filtered out. > > Besides, I think what you really meant was: > > (string-match "[^\\0-\\177]" "string") > > since "1ff" is not a valid octal number. > > I think that taking the title of the entry and checking if at least 90% > are ASCII characters would be sufficient to filter out Asian texts. You > can also try taking first 100 (or so) characters of the body. I think > you could use replace-regexp-in-string for that purpose: > > (defun mn-non-english-p (string) > (> > (* (length (replace-regexp-in-string "[^\\0-\\77]" "" string)) 10) > (* (length string) 9)))
I like the way this looks. Seems that it will allow the characters I would like to keep but remove posts which I cannot read. Here is my problem, and forgive what I can only assume is my lack of understanding in doing complex scoring/filtering, but I don't know how to implement this. I have read through the gnus info manual section on scoring and don't see anywhere that I can plug in a function to perform this action on the subject. I will readily admit that it is probable that I just missed it. If someone could point me to the place where this is explained in the manual I would be very appreciative. I must add that the body of the posts from this nnrss group consist of only the following lines: Tables Linearized About This Style link comments About This Style Table contents are turned into a sequence of paragraphs, one per cell. The part about "Tables Linearized" is added by something I use. The explanation is on the last line. I mention this because I don't think scoring on the body of the post will work in this case. Thanks for all the help from both you and Ted, rdc -- Robert D. Crawford [EMAIL PROTECTED] Your temporary financial embarrassment will be relieved in a surprising manner. _______________________________________________ info-gnus-english mailing list [email protected] http://lists.gnu.org/mailman/listinfo/info-gnus-english
