I moved the proposed bb corpora to another directory in case it is ever needed again. It can be found in /home/bbmass/uploadedcorpora-obsolete.

Moved: bb-doc, bb-fredt, bb-trec_enron, bb-zmi
(All except for trec_enron were nearly empty anyway.)

John agreed to clean up bb-jhardin.
Justin, would it be OK to remove the oldest year from your bb-jm corpus?

On 01/08/2011 04:20 PM, Warren Togami Jr. wrote:
It appears that some of the bb* corpora are extremely old and no
longer representative of modern mail.  Would anyone object if I went
ahead and cleaned it up a bit?   Proposed changes below.  Yes, this
would shrink the ham sample size, but my active masscheck recruiting
should grow that, and I think we're better off with quality data from
more recent ham than quantity of old ham.

Reply via email to