Monday
April 30
4:00 - 4:50 PM 
Kelley 1001

 

Rich Caruana 
Assistant Professor
Dept. of Computer Science
Cornell University

 

Which Supervised Learning Method Works Best for What? An Empirical
Comparison of Learning Methods and Metrics++

 

Decision trees are intelligible, but do they perform well enough that
you should use them? Have SVMs replaced neural nets, or are neural nets
still best for regression, and SVMs best for classification? Boosting
maximizes margins similar to SVMs, but can boosting compete with SVMs?
And if it does compete, is it better to boost weak models, as theory
might suggest, or to boost stronger models? Bagging is simpler than
boosting -- how well does bagging stack up against boosting? Breiman
said Random Forests are better than bagging and as good as boosting. Was
he right? And what about old friends like logistic regression, KNN, and
naive bayes? Should they be relegated to the history books, or do they
still fill important niches? In this talk we compare the performance of
these supervised learning methods on a number of preformaance criteria:
Accuracy, F-score, Lift, Precision/Recall Break-Even Point, Area under
the ROC, Average Precision, Squared Error, Cross-Entropy, and
Probability Calibration. The results show that no one learning method
does it all, but some methods can be "repaired" so that they do very
well across all performance metrics. In particular, we show how to
obtain the best probabilities from max margin methods such as SVMs and
boosting via Platt's Method and isotonic regression. We then describe a
new ensemble method that combines select models from these ten learning
methods to yield much better performance. Although these ensembles
perform extremely well, they are too complex for many applications.
We'll describe a model compression method we are developing to fix that.
Finally, if time permits (it probably won't), we'll discuss how the
performance metrics relate to each other, and which of them you probably
should (or shouldn't) use.

 

Biography:

 

Rich Caruana is an Assistant Professor of Computer Science at Cornell
University. He got his Ph.D. at CMU in 1997 where he worked with Tom
Mitchell and Herb Simon. Before joining the faculty at Cornell in 2001
was on the faculty in the Medical School at UCLA and at CMU's Center for
Learning and Discovery (CALD). Rich's research is in machine learning
and data mining, and applications of these to medical decision making,
bioinformatics, and weather forecasting. He is best known for his work
in inductive transfer, semi-supervised learning, and optimizing learning
for different performance criteria. Rich likes to mix algorithm
development with applications work to insure that the methods he
developes really work in practice.

_______________________________________________
Colloquium mailing list
[email protected]
https://secure.engr.oregonstate.edu/mailman/listinfo/colloquium

Reply via email to