Date: Wed 19 Oct 2005 19:59:00 GMT+02:00
Subject: IRIDIA Seminar on Thu 27 Oct 2005 at 03:00 PM
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
I R I D I A S E M I N A R
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
Who:
Universite catholique de Louvain
What:
Distances on graphs, and the exploration/exploitation trade-off
in reinforcement learning
When:
Thu 27 Oct 2005 from 03:00 PM for 60 min (+questions)
Where:
IRIDIA
Université Libre de Bruxelles
Building C, Floor 5
87 av. Adolph Buyl
Brussels, Belgium
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
Distances on graphs, and the exploration/exploitation
trade-off in reinforcement learning
Marco Saerens
Universite catholique de Louvain
Abstract
This work presents some general procedures for computing
dissimilarities/similarities between elements of a database or, more
generally, nodes of a weighted, undirected, graph. It is based on a
Markov-chain model of random walk through the database. The model
assigns transition probabilities to the links between elements, so that
a random walker can jump from element to element. A quantity, called the
average first-passage time, computes the average number of steps needed
by a random walker for reaching element k for the first time, when
starting from element i. Another closely related quantity, called the
average commute time, provides a distance measure between any pair of
elements. Yet another quantity of interest, closely related to the
average commute time, is the pseudoinverse of the Laplacian matrix of
the graph, which represents a similarity measure between the nodes of
the graph, and is a kernel in the "support vector machine" sense. Unlike
the standard "Dijkstra" or "shortest path" distance, these quantities,
representing dissimilarities (similarities) between any two elements,
have the nice property of decreasing (increasing) when the number of
paths connecting these two elements increases and when the "length" of
any path decreases. We then define the principal component analysis
(PCA) of a graph as the subspace projection that preserves as much
variance as possible, in terms of the commute time distance. This PCA
has some interesting links with spectral graph theory, in particular
"spectral clustering". Finally, we generalize this work by defining
another distance measure between the nodes of a graph, issued from the
concept of reinforcement learning.First of all, we define the concept of
degree of exploration from a state as the entropy related to the
probability distribution of the set of admissible actions in this state.
This entropy value allows to control the degree of exploration linked to
this state, and should be provided by the user. Then, we restate the
exploration/exploitation problem as a global optimization problem:
define the best exploration strategy that minimizes the expected
cumulated cost, for fixed degrees of exploration. This formulation leads
to a set of nonlinear updating rules reminiscent from the ``value
iteration'' algorithm. Interestingly enough, when the degree of
exploration is zero for all states (no exploration), these equations
reduce to Bellman's equations for finding the shortest path while, when
it is maximum, a full ``blind'' exploration i s performed.
Keywords
Graphs, Distance measure, Reinforcement learning.
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/