Begin forwarded message:

From: Mauro Birattari <[EMAIL PROTECTED]>
Date: Wed 19 Oct 2005 19:59:00 GMT+02:00
To: Seminars <[EMAIL PROTECTED]>, Marco Saerens <[EMAIL PROTECTED]>
Subject: IRIDIA Seminar on Thu 27 Oct 2005 at 03:00 PM


_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

                     I R I D I A     S E M I N A R

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

Who:
        Marco Saerens <[EMAIL PROTECTED]>
        Universite catholique de Louvain

What:
        Distances on graphs, and the exploration/exploitation trade-off
        in reinforcement learning

When:
        Thu 27 Oct 2005 from 03:00 PM for 60 min (+questions)

Where:
        IRIDIA
        Université Libre de Bruxelles
        Building C, Floor 5
        87 av. Adolph Buyl
        Brussels, Belgium

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/


         Distances on graphs, and the exploration/exploitation
                  trade-off in reinforcement learning

                             Marco Saerens
                    Universite catholique de Louvain

                         [EMAIL PROTECTED]


                                Abstract

This   work   presents   some   general   procedures    for    computing
dissimilarities/similarities between elements of  a  database  or,  more
generally, nodes of a weighted, undirected, graph.  It  is  based  on  a
Markov-chain model of  random  walk  through  the  database.  The  model
assigns transition probabilities to the links between elements, so  that
a random walker can jump from element to element. A quantity, called the
average first-passage time, computes the average number of steps  needed
by a random walker for reaching element  k  for  the  first  time,  when
starting from element i. Another closely related  quantity,  called  the
average commute time, provides a distance measure between  any  pair  of
elements. Yet another quantity  of  interest,  closely  related  to  the
average commute time, is the pseudoinverse of the  Laplacian  matrix  of
the graph, which represents a similarity measure between  the  nodes  of
the graph, and is a kernel in the "support vector machine" sense. Unlike
the standard "Dijkstra" or "shortest path" distance,  these  quantities,
representing dissimilarities (similarities) between  any  two  elements,
have the nice property of decreasing (increasing)  when  the  number  of
paths connecting these two elements increases and when the  "length"  of
any path decreases. We then  define  the  principal  component  analysis
(PCA) of a graph as the  subspace  projection  that  preserves  as  much
variance as possible, in terms of the commute time  distance.  This  PCA
has some interesting links with spectral  graph  theory,  in  particular
"spectral clustering". Finally, we  generalize  this  work  by  defining
another distance measure between the nodes of a graph, issued  from  the
concept of reinforcement learning.First of all, we define the concept of
degree of exploration from  a  state  as  the  entropy  related  to  the
probability distribution of the set of admissible actions in this state.
This entropy value allows to control the degree of exploration linked to
this state, and should be provided by the user.  Then,  we  restate  the
exploration/exploitation  problem  as  a  global  optimization  problem:
define  the  best  exploration  strategy  that  minimizes  the  expected
cumulated cost, for fixed degrees of exploration. This formulation leads
to a set of  nonlinear  updating  rules  reminiscent  from  the  ``value
iteration''  algorithm.  Interestingly  enough,  when  the   degree   of
exploration is zero for all states  (no  exploration),  these  equations
reduce to Bellman's equations for finding the shortest path while,  when
it is maximum, a full ``blind'' exploration i s performed.


                                Keywords

Graphs, Distance measure, Reinforcement learning.

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/



    Carlos Gershenson...
    Centrum Leo Apostel, Vrije Universiteit Brussel
    Krijgskundestraat 33. B-1160 Brussels, Belgium

  “To know your limits you need to go beyond them”

Reply via email to