Friday
November 5         *** Note the special date and time ***
11:00 - 12:00 PM 
Kelley 1001


Burr Settles 
Postdoctoral Fellow 
Machine Learning Department
Carnegie Mellon University


Asking the Right Questions: New Query Modes in Active Learning

The key idea behind active learning is that a machine learning algorithm can 
achieve greater accuracy with less training if it is allowed to choose the data 
from which it learns. In this talk, I present two recent active learning 
paradigms in which learning algorithms may pose novel types of "queries" of 
human annotators to great effect. We call these new paradigms 
"multiple-instance active learning" and "feature active learning." In 
traditional active learning, a partially-trained model selects new data 
instances to be labeled by a human annotator, which are then added to the 
training set and the process repeats. In a text classification task, for 
example, the learner might query for the labels of informative-looking 
documents. However, having a human read an entire document can be an 
inefficient use of time, particularly when only certain passages or keywords 
are relevant to the task at hand. Multiple-instance active learning addresses 
this problem by allowing the model to 
 selectively obtain more focused labels at the passage level in cases where 
noisy document-level labels might be available (e.g., from hyperlinks or 
citation databases). This active learning approach provides a direct training 
signal to the learner and is also less cumbersome for humans to read. Likewise, 
feature active learning allows the learner to query for the labels of salient 
words (e.g., the query word "puck" might be labeled "hockey" in a sports 
article classification task), which naturally exploits the annotator's inherent 
domain knowledge. We show that such alternative query paradigms, especially 
when combined with intuitive user interfaces, can make more efficient use of 
human annotation effort. [Joint work with Mark Craven, Soumya Ray, Gregory 
Druck, and Andrew McCallum.]


Biography

Burr Settles is a Postdoctoral Fellow in the Machine Learning Department at 
Carnegie Mellon University. He received his PhD from the University of 
Wisconsin-Madison in 2008 with a major in Computer Sciences and minors in 
Linguistics and Biology. His current research interests are focused on 
maximizing the use of unlabeled data and minimizing the cost of obtaining 
labeled data for applications in natural language processing and 
bioinformatics. He also runs the website FAWM.ORG, prefers sandals to shoes, 
and plays guitar in the Pittsburgh pop band Delicious Pastries.
_______________________________________________
Colloquium mailing list
[email protected]
https://secure.engr.oregonstate.edu/mailman/listinfo/colloquium

Reply via email to