On Saturday, January 8, 2011, 6:20:19 PM, Warren Jr. wrote: > It appears that some of the bb* corpora are extremely old and no > longer representative of modern mail. Would anyone object if I went > ahead and cleaned it up a bit? Proposed changes below. Yes, this > would shrink the ham sample size, but my active masscheck recruiting > should grow that, and I think we're better off with quality data from > more recent ham than quantity of old ham.
+1 Old corpora may result in incorrect scores being applied current messages. There should be a generalized expiration strategy for the coropora. Cheers, Jeff C.
