[UAI] Learning from Multiple Sources Workshop, 13 Dec '08 Whistler Canada

David R. Hardoon Sat, 20 Sep 2008 10:47:55 -0700

Apologies if multiple copies are received.

Call for Papers:


----------------------------------------------------------------------

NIPS 2008 WORKSHOP on LEARNING FROM MULTIPLE SOURCES
http://web.mac.com/davidrh/LMSworkshop08/
http://nips.cc/

----------------------------------------------------------------------

BACKGROUND

While the machine learning community has primarily focused onanalysing output of a single data source, there has been relativelyfew attempts to develop a general framework, or heuristics, foranalysing several data sources in terms of a shared dependencystructure. Learning from multiple data sources (or alternatively, thedata fusion problem) is a timely research area. Due to the increasingavailability and sophistication of data recording techniques andadvances in data analysis algorithms, there exists many scenarios inwhich it is necessary to model multiple, related data sources, i.e. infields such as bioinformatics, multimodal signal processing,information retrieval etc. The relevance of this research area isinspired by the human brain's ability to integrate five differentsensory input streams into a coherent representation of its environment.

The open question is to find approaches to analyse data which consistsof more than one set of observations (or view) of the same phenomenon.In general, existing methods use a discriminative approach, where aset of features for each data set is found in order to explicitlyoptimise some dependency criterion. Existing approaches includecanonical correlation analysis (Hotelling, 1936), a standardstatistical technique for modeling two data sources, and its multisetvariation (Kettenring, 1971) which find linearly correlated featuresbetween data sets, and kernel variants (Lai and Fyfe, 2000; Bach andJordan, 2002; Hardoon et al., 2004) and approaches that optimise themutual information between extracted features (Becker, 1996; Chechiket al., 2003). However, discriminative approaches may be ad hoc,require regularisation to ensure erroneous shared features are notdiscovered, and it is difficult to incorporate prior knowledge aboutthe shared information. Generative probabilistic approaches addressthis problem by jointly modeling each data stream as a sum of a sharedcomponent and a 'private' component that models the within-setvariation (Bach and Jordan, 2005; Leen and Fyfe, 2006; Klami andKaski, 2006).

These approaches assume a simple relationship between (two) datasources, i.e.assuming a so-called 'flat' data structure where the dataconsists of N independent pairs of related data variables; whereas inpractice, related data sources may exhibit extremely complex co-variation (for instance, audio and visual streams related to the samevideo). A potential solution to this problem could be a fullyprobabilistic approach, which could be used to impose structuredvariation within and between data sources. Additional methodologicalchallenges include determining what is the 'useful' information we aretrying to learn from the multiple sources and building models forpredicting one data source given the others. As well as theunsupervised learning of multiple data sources detailed above, thereis the closely related problem of multitask learning (Bickel et al.,2008), or transfer learning, where a task is learned from otherrelated tasks.


WORKSHOP

The aim of the workshop is to promote discussion amongst leadingmachine learning and applied researchers about learning from multiple,related sources of data, with a focus on both methodological issuesand applied research problems.


Topics of the workshop include (but not limited to):

- unsupervised learning (generative / discriminative modeling) ofmultiple related data sources

- canonical correlation analysis-type methods

- data fusion for real world applications, such as bioinformatics,sensor networks, multimodal signal processing, information retrieval

- multitask /transfer learning
- multiview learning

INVITED SPEAKERS

Prof. Michael Jordan
University of California, Berkeley
http://www.cs.berkeley.edu/~jordan/

Dr. Francis Bach
École normale supérieure
http://www.di.ens.fr/~fbach/

Dr. Tobias Scheffer
Max-Planck-Institut fur Informatik
http://www.mpi-inf.mpg.de/~scheffer/

ORGANISERS

David R. Hardoon                (University College London)
Gayle Leen                              (Helsinki University of Technology)
Samuel Kaski                    (Helsinki University of Technology)
John Shawe-Taylor               (University College London)

PROGRAM COMMITTEE

Andreas Argyriou                (University College London)
Tom Dieithe                     (University College London)
Colin Fyfe                      (University of the West of Scotland)
Jaakko Peltonen         (Helsinki University of Technology)

SUBMISSIONS

We invite the submission of high quality extended abstracts (2 to 4pages) in the NIPS style http://nips.cc/PaperInformation/StyleFiles.Abstracts should be sent (in .pdf/.ps) to [EMAIL PROTECTED], [EMAIL PROTECTED].

A selection of the submitted abstracts will be accepted as either anoral presentation or poster presentation. The best abstracts will beconsidered for extended versions in the workshop proceedings, andpossibly as a special issue of a journal.


IMPORTANT DATES

24 Oct 08 Submission deadline for extended abstracts
28 Oct 08 Notification of acceptance
13 Dec 08 Workshop at NIPS 08, Whistler, Canada


REFERENCES

BACH, F.R., & JORDAN, M.I. 2002. Kernel Independent ComponentAnalysis. Journal of Machine Learning, 3, 1-48.BACH, F.R., & JORDAN, M.I. 2005. A Probabilistic Interpretation ofCanonical Correlation Analysis. Tech. rept. 688. Dept of Statistics,University of California.BECKER, S. 1996. Mutual Information Maximization: models of corticalselforganisation. Network: Computation in Neural Systems, 7, 7-31.BICKEL, S., BOGOJESKA, J., LENGAUER, T., & SCHEFFER, T. Multi-tasklearning for HIV therapy screening. ICML 2008CHECHIK, G., GLOBERSON, A., TISHBY, N., & WEISS, Y. 2003. InformationBottleneck for Gaussian variables. Pages 1213-1220 of: THRUN, S.,SAUL, L.K., & SCH¨OLKOPF, B. (eds), Advances in Neural InformationProcessing Systems, vol. 16.HARDOON, D. R., SZEDMAK, S. & SHAWE-TAYLOR, J. 2004 CanonicalCorrelation Analysis: An Overview with Application to LearningMethods. Neural Computation, 16(12), 2639-2664HOTELLING, H. 1936. Relations between two sets of variates.Biometrika, 28, 312-377.KETTENRING, J. R. 1971. Canonical analysis of several sets ofvariables. Biometrika, 58(3), 433-451.KLAMI, A., & KASKI, S. 2006. Generative models that discoverdependencies between two data sets. Pages 123-128 of: MCLOONE, S.,ADALI, T., LARSEN, J., HULLE, M. VAN, ROGERS, A., & DOUGLAS, S.C.(eds), Machine Learning for

Signal Processing XVI. IEEE.

LAI, P. L., & FYFE, C. 2000. Kernel and Nonlinear CanonicalCorrelation Analysis. International Journal of Neural Systems, 10(5),365-377.LEEN, G., & FYFE, C. 2006. A Gaussian Process Latent Variable ModelFormulation of Canonical Correlation Analysis. Pages 413-418 of:Proceedings of the 14th European Symposium of Artificial NeuralNetworks (ESANN).


_______________________________________________
uai mailing list
uai@ENGR.ORST.EDU
https://secure.engr.oregonstate.edu/mailman/listinfo/uai

[UAI] Learning from Multiple Sources Workshop, 13 Dec '08 Whistler Canada

Reply via email to