I'm rather rusty on graph theory, but I'm in a mood to take a whack at
this anyway...

Use any convenient data structure to represent your graph.  The nodes
are the songs, the edges are the "relationships" between songs.  You can
can model the relationships in several ways.  Starting with song A
1) As described, pick out the one "most similar" song B from a subset of
the other nodes.  This increases the weight of edge AB by one.
2) You can have the user weight the similarity of B1,B2,...Bn to A.  The
final weight of AB is the average weight given by users who examined those
two songs together.
3) Probably a bunch more.

There are two algorithms to play with.  Finding connected components (there
is a non-zero weight path between every node) and finding cliques (there
is a non-zero weight edge between each node).  My guess is that connectedness
will be more interesting than cliques.  You can adjust the sensitivity
of the algorithm (by reducing the weights of edges, or explicitly building
a threshold into the algorithm) to find more-strongly linked regions.

Another possibility (especially if you are using percentage weights, like
in scenario two) is to play with distance algorithms.  Invert the weights,
so that two completely dissimilar songs are at infinite distance, and
closely related songs have low edge weights.  Compute shortest paths.
This could be combined with connected components to pick out the
"exemplar" of the component: take the song with the shortest total distance
to all others in the component.

The details of the algorithms can be found in any algorithm text book.

--kag
_______________________________________________
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to