This is very good. I wrote a project paper last semester comparing the
results of using a single layer perceptron such as we use to score rules
with a linear kernel SVM for classification of cancer cells from
microarray data. The conclusion was that single layer perceptrons are
not as bad as bioinformatics people generally assume, but linear kernel
SVMs still are a bit better. I expected that we would see similar
results in rule scoring -- I.e., we know that the perceptron performs ok
 but should see somewhat better scoring from a linear kernel SVM. It was
on my eventual to-do list to try it out. I'm glad Alexander was able to
do it.

In theory the primary advantage of SVM over perceptron should be that
the rule scores produce better results on the mail that it is used on
after the initial training. The idea is that the results of an SVM are
more tolerant of changes in the data that occur over time, and that is a
consideration because spam is always evolving.

Questions that I have regarding Alexander's SVM:

Alexander, did you try using the SpamAssassin perceptron and compare the
results of using its rule scores with using the scores from the SVM?

How does the speed of the SVM in perl compare with the perceptron in C?
It's ok if it is much slower, as long as it is still practical to run
rule scoring when we do a release, but it would be good to know that it
is still practical to do that.

For anyone with an opinion: The SVM code in Alexander's program says
that it is a perl port he did of the SVM code in WEKA, which is written
in Java. WEKA is licensed under GPL. Does anyone have an informed
opinion on whether porting a java program to perl makes it a derivative
work for copyright purposes and if we would have licensing issues with that?

There is a CPAN module Algorithm::SVM which is a perl interface to the
libsvm module. That would provide C speed performance. The libsvm module
is very high performance, actively maintained, and probably the most
widely used version of SVM code out there. It would not present the
licensing questions. ( http://www.csie.ntu.edu.tw/~cjlin/libsvm/COPYRIGHT )

Alexander, what do you think of calling Algorithm::SVM instead of the
code that you ported?

 -- sidney

Reply via email to