wiss-org-19.3.-18.5.10: VERANSTALTUNG: SIGIR-2010 Workshop on Feature Generation and Selection for Information Retrieval

Ohly, H. Peter Thu, 27 May 2010 06:22:40 -0700

-----Original Message-----
From: Evgeniy Gabrilovich [mailto:g...@yahoo-inc.com] 
Sent: Saturday, May 15, 2010 12:00 AM
To: ml-n...@googlegroups.com; u...@engr.orst.edu; u...@eecs.oregonstate.edu; 
a...@aclweb.org; m...@ics.uci.edu; sigir-announce-requ...@acm.org; 
m...@isle.org; machine-learn...@yahoogroups.com; irl...@lists.shef.ac.uk; 
topic-mod...@lists.cs.princeton.edu; bayes-n...@stat.cmu.edu; 
research...@pascal-network.org
Subject: [SIG-IRList] 2nd CFP: SIGIR-2010 Workshop on Feature Generation and 
Selection for Information Retrieval


================================================================
Second Call for Papers

Feature Generation and Selection for Information Retrieval Workshop at the 33rd 
Annual ACM SIGIR Conference (SIGIR 2010)

http://alex.smola.org/workshops/sigir10/

July 23, 2010
Geneva, Switzerland

SUBMISSIONS DUE MAY 30, 2010
================================================================

We solicit submissions for the Workshop on Feature Generation and Selection for 
Information Retrieval, to be held on July 23, 2010, in Geneva, Switzerland, in 
conjunction with the 33rd Annual International ACM SIGIR Conference on Research 
and Development in Information Retrieval (SIGIR 2010). The workshop will bring 
together researchers and practitioners from academia and industry to discuss 
the latest developments in various aspects of feature generation and selection 
for textual information retrieval.

Modern information retrieval systems facilitate information access at 
unprecedented scale and level of sophistication.
However, in many cases the underlying representation of text remains quite 
simple, often limited to using a weighted bag of words. Over the years, several 
approaches to automatic feature generation have been proposed (such as Latent 
Semantic Indexing, Explicit Semantic Analysis, Hashing, and Latent Dirichlet 
Allocation), yet their application in large scale systems still remains the 
exception rather than the rule. On the other hand, numerous studies in NLP and 
IR resort to manually crafting features, which is a laborious and expensive 
process. Such studies often focus on one specific problem, and consequently 
many features they define are task- or domain-dependent. Consequently, little 
knowledge transfer is possible to other problem domains. This limits our 
understanding of how to reliably construct informative features for new tasks.

An area of machine learning concerned with feature generation (or constructive 
induction) studies methods that endow computers with the ability to modify or 
enhance the representation language. Feature generation techniques search for 
new features that describe the target concepts better than the attributes 
supplied with the training instances. It is worthwhile to note that traditional 
machine learning data sets, such as those available from the UCI data 
repository, are only available as feature vectors, while their feature set is 
essentially fixed. In fact, feature generation for specific UCI benchmark 
datasets is scorned upon. On the other hand, textual data is almost always 
available in its raw format (in some case as structured data with sufficient 
side information). Given the importance of text as a data format, it is well 
worthwhile designing text-specific feature generation algorithms.
Complementary to feature generation, the issue of feature selection arises. It 
aims to retain only the most informative features, e.g., in order to reduce 
noise and to avoid overfitting, and is essential when numerous features are 
automatically constructed. This allows us to deal with features that are 
correlated, redundant, or uninformative, and hence we may want to decimate them 
through a principled selection process.

We believe that much can be done in the quest for automatic feature generation 
for text processing, for example, using large-scale knowledge bases as well as 
the sheer amounts of textual data easily accessible today. We further believe 
the time is ripe to bring together researchers from many related areas 
(including information retrieval, machine learning, statistics, and natural 
language processing) to address these issues and seek cross-pollination among 
the different fields.

Papers from a rich set of empirical, experimental, and theoretical perspectives 
are invited. Topics of interest for the workshop include but are not limited to:
- Identifying cases when new features should be constructed
- Knowledge-based methods (including identification of appropriate knowledge 
resources)
- Efficiently utilizing human expertise (akin to active learning, assisted 
feature construction)
- (Bayesian) nonparametric distribution models for text (e.g. LDA, hierarchical 
Pitman-Yor model)
- Compression and autoencoder algorithms (e.g., information bottleneck, deep 
belief networks)
- Feature selection (L1 programming, message passing, dependency measures, 
submodularity)
- Cross-language methods for feature generation and selection
- New types of features, e.g., spatial features to support geographical IR
- Applications of feature generation in IR (e.g., constructing new features for 
indexing, ranking)

The workshop will include invited talks as well as presentations of accepted 
research contributions. The schedule will provide time for both organized and 
open discussion.
Registration will be open to all SIGIR 2010 attendees.

Submission Instructions
=======================

Submissions should report new (unpublished) research results or ongoing 
research. Submissions can be up to 8 pages long for full papers, and up to 4 
pages long for short papers. Papers should be formatted in double-column ACM 
SIG proceedings format 
(http://www.acm.org/sigs/publications/proceedings-templates;
for LaTeX, use "Option 2"). Papers must be in English and must be submitted as 
PDF files.

Papers should be submitted electronically using the EasyChair system at 
http://www.easychair.org/conferences/?conf=fgsir10 no later than 23:59 Pacific 
Standard time, Sunday, May 30, 2010.

At least one author of each accepted paper will be expected to attend and 
present their findings at the workshop.

Important Dates
===============
Submission Deadline:     May 30,  2010
Acceptance notification: June 25, 2010
Camera-ready submission: July 5,  2010
Workshop date:           July 23, 2010

Invited speakers
================

- Dr. Kenneth Church, Chief Scientist of the Human Language
  Technology Center of Excellence at the Johns Hopkins University
- Dr. Yee Whye Teh, Lecturer at the Gatsby Computational
  Neuroscience Unit, University College London

Organizing Committee
====================
- Evgeniy Gabrilovich, Yahoo! Research, USA
- Alex Smola, Australian National University and Yahoo! Research, USA
- Naftali Tishby, Hebrew University of Jerusalem, Israel

Program Committee
=================
- Francis Bach, INRIA, France
- Misha Bilenko, Microsoft Research, USA
- David Blei, Princeton, USA
- Karsten Borgwardt, Max Planck Institute, Germany
- Wray Buntine, NICTA, Australia
- Raman Chandrasekar, Microsoft Research, USA
- Kevyn Collins-Thompson, Microsoft Research, USA
- Silviu Cucerzan, Microsoft Research, USA
- Brian Davison, Lehigh University, USA
- Gideon Dror, Academic College of Tel-Aviv-Yaffo, Israel
- Arkady Epshteyn, Google, USA
- Wai Lam, CUHK, Hong Kong SAR, China
- Tie-Yan Liu, Microsoft Research Asia, China
- Shaul Markovitch, Technion, Israel
- Donald Metzler, USC/ISI, USA
- Daichi Mochihashi, NTT, Japan
- Patrick Pantel, Yahoo, USA
- Filip Radlinski, Microsoft Research, United Kingdom
- Rajat Raina, Facebook, USA
- Pradeep Ravikumar, University of Texas at Austin, USA
- Mehran Sahami, Stanford, USA
- Le Song, CMU, USA
- Krysta Svore, Microsoft Research, USA
- Volker Tresp, Siemens, Germany
- Eric Xing, CMU, USA
- Kai Yu, NEC, USA
- ChengXiang Zhai, UIUC, USA
- Jerry Zhu, University of Wisconsin, USA

************************************************ 
This SIGIR-IRList message and the SIG-IRList Digest (a moderated IR 
newsletter), are brought to you by SIGIR, distributed from the University of 
Sheffield and edited by Mark Smucker (irlist-edi...@acm.org). 
o       To submit an article, e-mail irl...@lists.shef.ac.uk 
o       To subscribe, send mail to sy...@lists.shef.ac.uk , with the subject: 
SUBSCRIBE irlist firstname lastname
o       To unsubscribe, send mail to sy...@lists.shef.ac.uk, with the subject: 
UNSUBSCRIBE irlist YourEmailAddressHere
[The email address is required only if you want to unsubscribe with an address 
other than the address with which you send the message]

o       For more info, visit: http://www.sigir.org/sigirlist/
These files are not to be sold or used for commercial purposes. 
THE OPINIONS EXPRESSED WITHIN THIS DOCUMENT DO NOT REPRESENT THOSE OF THE 
EDITOR, THE UNIVERSITY OF WATERLOO OR THE UNIVERSITY OF SHEFFIELD. 
AUTHORS ASSUME FULL RESPONSIBILITY FOR THEIR MATERIAL.

wiss-org-19.3.-18.5.10: VERANSTALTUNG: SIGIR-2010 Workshop on Feature Generation and Selection for Information Retrieval

Antwort per Email an