I moved the proposed bb corpora to another directory in case it is ever
needed again. It can be found in /home/bbmass/uploadedcorpora-obsolete.
Moved: bb-doc, bb-fredt, bb-trec_enron, bb-zmi
(All except for trec_enron were nearly empty anyway.)
John agreed to clean up bb-jhardin.
Justin, would it be OK to remove the oldest year from your bb-jm corpus?
On 01/08/2011 04:20 PM, Warren Togami Jr. wrote:
It appears that some of the bb* corpora are extremely old and no
longer representative of modern mail. Would anyone object if I went
ahead and cleaned it up a bit? Proposed changes below. Yes, this
would shrink the ham sample size, but my active masscheck recruiting
should grow that, and I think we're better off with quality data from
more recent ham than quantity of old ham.