On Tue, May 10, 2011 at 12:24 PM, Manuel Blechschmidt
<[email protected]> wrote:
> Hello guys,
> I used a lot of Mahout especially Taste in my Master Thesis: "An architecture
> for evaluating recommender systems in real world scenarios". I wanted to give
> some feedback about it. If somebody is interested in the whole work (97
> pages) drop me an email.
Great, thanks for the kudos. It would be good to post a link to your
work on the user@ list if you like.
> I was especially difficult to get the IDMigrator working. Would be quite cool
> if there would be a DataModel which automatically includes String migration.
This is how it worked originally -- it just doesn't scale nearly as
well. It's really a much better idea to use numeric IDs, so the
framework pushes you that way.
> I had some problems that some interfaces did not implement the Serializable
> interface. I already opened a ticket MAHOUT-650.
Yes interesting issue, though I don't believe a change is called for
in the framework. The issue notes have what I consider the "right" way
to approach this.
> Is there a benchmark engine telling RMSE of the different algorithms? Would
> be cool if a maven command would be available. So when I implement a new
> recommender I can directly benchmark it against the other implementations.
RMSE is not a property of an algorithm, but an algorithm and a
particular data set at least. I don't think this is possible as a
result.
> * getNumUsersWithPreferenceFor for the MySQL DataModel only works for at
> most two things and there is no warning if more are supplied
Maybe this is fixed since you looked, but it does throw an error:
Preconditions.checkArgument(length != 0 && length <= 2, "Illegal
number of item IDs: " + length);
> * DataModel expects that there is always only one rating from a user to an
> item (what about reratings?)
Yes, that's true. The most recent rating always counts. It might be
interesting to find a way to factor in re-ratings, but to actually
build that in the framework would cause scale problems and I don't
know algorithms that use it. So maybe it's better to collapse multiple
ratings into one (weighted average favoring recent one?)
> I also attached some images which should explain how Taste is doing it's job
> in my system.
(Images aren't included in mail to @apache.org mailing lists, you'd
have to post it elsewhere.