As part of a term project I'm about to finish I've been looking at some
aspects of the perceptron scoring we do and have some ideas for alternatives
I would like to try.

Can someone tell me how many email samples and how many rules typically go
into the perceptron run and how long it usually takes to generate the scores
from them? How many iterations does it usually take to converge?

This next questions are aimed toward Henry or anyone else who understands them:

What do you think about using a support vector machine (SVM) instead of a
perceptron? Shouldn't that help a lot with overfitting and produce scores
that are more robust in the face of the inevitable changes that creep in to
the data before the next release and rescoring?

What about a technique like RFE (recursive feature extraction) along with
SVM to select rules to be eliminated, rather than only looking at individual
S/O ratios? Or is that not as applicable here because "spam" is not one
homogeneous class and rules act individually to catch members of different
subclasses of the general class "spam"?

Ok, 24 hours until my term paper is due... That's enough distracting myself :-)

 -- sidney

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to