On 2/2/2010 10:33 AM, Маллиндайн Стивен (Steve Mallindine) wrote: > There's an option (in v2 at least) to remove duplicate entries from > the spam/not spam folders. > > But if memory serves, duplicate (identical) messages won't harm the > Bayesian corpus... It's looking for word patterns... So if the same > pattern appears in identical message bodies, but from different > senders, why should that matter? > > Steve >
Won't the bad words become weighted higher because they will be more frequent? Plus, eventually all the identical messages are going to overwrite the other ones, removing those bad words completely. I thought that was the idea behind having bomb tests, to prevent tons of identical spam from corrupting the corpus. But I'm not arguing anything should change, the likely hood of this happening is properly very minimal and I'd much rather have DNSBL running first if it saves on having to download the entire email, and thus saving resources. ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Assp-test mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-test
