https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7115
--- Comment #6 from Mark Martinec <[email protected]> --- Created attachment 5263 --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5263&action=edit added configurability > > I'd REALLY like to see this extra tokenizing as a switchable option. > Will do something along these lines. Here it comes. Adds a config option, and conditionalizes sources of input to Bayes. Most of the diff is due to indentation change, consistency of variable names, and some cosmetics. This is the added documentation (man Mail::SpamAssassin::Conf): bayes_token_sources (default: header visible invisible uri) Controls which sources in a mail message can contribute tokens (e.g. words, phrases, etc.) to a Bayes classifier. The argument is a space-separated list of keywords: header, visible, invisible, uri, mimepart), each of which may be prefixed by a no to indicate its exclusion. Additionally two reserved keywords are allowed: all and none (or: noall). The list of keywords is processed sequentially: a keyword all adds all available keywords to a set being built, a none or noall clears the set, other non-negated keywords are added to the set, and negated keywords are removed from the set. Keywords are case-insensitive. The default set is: header visible invisible uri, which is equivalent for example to: All NoMIMEpart. The reason why mimepart is not currently in a default set is that it is a newer source (introduced with SpamAssassin version 3.4.1) and not much experience has yet been gathered regarding its usefulness. See also option "bayes_ignore_header" for a fine-grained control on individual header fields under the umbrella of a more general keyword header here. Keywords imply the following data sources: header - tokens collected from a message header section visible - words from visible text (plain or HTML) in a message body invisible - hidden/invisible text in HTML parts of a message body uri - URIs collected from a message body mimepart - digests (hashes) of all MIME parts (textual or non- textual) of a message, computed after Base64 and quoted-printable decoding, suffixed by their Content-Type all - adds all the above keywords to the set being assembled none or noall - removes all keywords from the set The "bayes_token_sources" directive may appear multiple times, its keywords are interpreted sequentially, adding or removing items from the final set as they appear in their order in "bayes_token_sources" directive(s). -- You are receiving this mail because: You are the assignee for the bug.
