Hierarchical modeling techniques work well on structures like this if you have good resolution of your meta-data. Resolving and disambiguating artist and track names can be difficult unless you have total control over the meta-data source.
The basic idea is that you model an artist as a distribution over "concept space", which is just a fancy name for latent variables you don't plan to understnad. then an album is sampled from the artists and is another distribution and finally a track is sampled from the album. This is similar to the way that in LDA, documents and words are distributions over your latent concept variables. Specific meanings are chosen at each point in a document and the word you observe is chosen based on the concept at that point. Since you only observe which word appears in which document, you have to reverse-engineer what the latent concepts might have been by getting a compromise between the word and document distributions. In your case, you have a simpler generative model, but similar techniques should apply. On Sat, Jun 13, 2009 at 8:53 AM, Karl Wettin <[email protected]> wrote: > > I hope that some semi-sophisticated Album, Track and ArtistSimilarity can > be used to improve the results. > > Perhaps it's a good idea to have Playlist, Album and Artist implemented as > Item too. -- Ted Dunning, CTO DeepDyve
