Thanks Owen. My next question is this step from the tutorial:
Edit recommender.properties and fill in the recommender.class: recommender.class=org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommender It seems this is already in the file, or only needs to be uncommented. My question is what is "GroupLensRecommender". I googled on it but didn't find any reference except the tutorial? Brian On Mon, Oct 19, 2009 at 8:47 AM, Sean Owen <[email protected]> wrote: > You got the "100K" data set which is quite different for some reason. > Make sure you nab the 1M data set and the instructions will make > sense. > > The target directory should exist in the tarball, since it exists in > SVN, but oops maybe it doesn't for some reason. In any event you can > just create it. > > Yes the underlying FileDataModel is pretty flexible. The javadoc > should cover it pretty well -- tab or comma separated, needs the first > three fields to be user ID, item ID, pref value (if applicable). > > It will read the 'u.data' file just fine. However, the example code > this tutorial references is using a custom implementation, since the > 1M and 10M data set files are using a strange format that needs > something customized. You could easily dig in to the code and swap in > FileDataModel for GroupLensDataModel if you want to use the 100K data > set. > > The other data is pretty domain-specific and is not directly relevant > to a recommender engine. So no there is nothing that would do anything > with 'u.item' for instance. However it would be pretty easy to write, > for example, a custom ItemSimilarity implementation that reads this > and deduces some notion of similarity from genre. You could then plug > that in to a GenericItemBasedRecommender for a fast, and perhaps quite > effective, recommender. > > Ah perhaps this will be an example in the book ... :) > > Sean > > > On Mon, Oct 19, 2009 at 4:27 PM, Brian Wolf <[email protected]> wrote: > > Hi, > > I discovered and downloaded mahout today. Maybe its just giddiness, but > can > > you help me, > > > > > > this from tutorial http://lucene.apache.org/mahout/taste.html > > " > > > > 1. Download the "1 Million MovieLens Dataset" from > > http://www.grouplens.org/. > > 2. Unpack the archive and copy ->movies.dat<- and > ->ratings.dat<- > > to > > > trunk/taste-web/src/main/resources/org/apache/mahout/cf/taste/example/grouplens > > under > > the Mahout distribution directory. > > > > " > > > > I > > > > > > I downloaded the MovieLens date set, there is no "movies.dat or > > ratings.dat". Are the correct files u.data and u.item? > > I haven't found any documention on file formats, there are other things > > confusing to new users, such as when I built > > the downloaded gz file, and built it with maven following the > instructions , > > the directory was only partly built, however, when I used checked out > with > > svn, the full diretory structure was built. > > > > Can Taste incorporate other data files, like the ones listed below, as > > well?, ie demographic data, etc Where can I find documentation about data > > file formats accepted by taste, or do I need to dig into the code? > > > > > > Thank you, > > Brian Wolf > > developer > > gOgO deVelopment, ltd > > Sedona, AZ > > > > u.data -- The full u data set, 100000 ratings by 943 users on 1682 > > items. > > Each user has rated at least 20 movies. Users and items are > > numbered consecutively from 1. The data is randomly > > ordered. This is a tab separated list of > > user id | item id | rating | timestamp. > > The time stamps are unix seconds since 1/1/1970 UTC > > > > u.info -- The number of users, items, and ratings in the u data set. > > > > u.item -- Information about the items (movies); this is a tab > separated > > list of > > movie id | movie title | release date | video release date | > > IMDb URL | unknown | Action | Adventure | Animation | > > Children's | Comedy | Crime | Documentary | Drama | Fantasy > | > > Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi | > > Thriller | War | Western | > > The last 19 fields are the genres, a 1 indicates the movie > > is of that genre, a 0 indicates it is not; movies can be in > > several genres at once. > > The movie ids are the ones used in the u.data data set. > > > > u.genre -- A list of the genres. > > > > u.user -- Demographic information about the users; this is a tab > > separated list of > > user id | age | gender | occupation | zip code > > The user ids are the ones used in the u.data data set. > > > > u.occupation -- A list of the occupations. > > >
