On Sun, Sep 9, 2012 at 10:27 PM, Ted Dunning <[email protected]> wrote:
> Great. > > If the update has a huge impact on existing code, can you break it into > manageable pieces? > > If it is just an addition, having a big blob of stuff is probably fine. Now it is integrable to Mahout, and can work with Mahout's existing Recommender interface. It does not modify any existing code, except a couple of additional lines in driver.class.props, which define a few commandline utilities I find useful while experimenting a recommender. By the way I found a few minor bugs, updated the patch. Did you have any chance to look at this? Secondly, I would like to up the thread to trigger a discussion on this. Sean raised some concerns on the patch. (available in the JIRA page as a comment) Quoting Sean's comment: "I imagine this is all great work. As I commented off-list, it is a big enough and even different enough beast that it feels like it should be a separate project. The Mahout code base is already uneven and sprawling and I think this would exacerbate that – and not generate much "synergy" worth the effort of integration." I understand all of these, and want to provide a general response to possibly clarify some of points Sean made. Basically it adds an online version of existing Mahout recommendation capabilities. Learning MF based recommender with Alternating Least Squares already exists in Mahout, and this is the SGD based version. The different targets approach is just a set of wrappers on those linear models. (Same as Generalized Linear Models approach) Adding side info is optional, which may be beneficial when there is a cold-start issue. Additionally, the OnlineFactorizationRecommender extends the AbstractRecommender, and the FactorizationAwareDataModel is a Mahout DataModel composed with a base DataModel that is capable of adding new ratings. Besides all these, I remember the initiative Ted started following Menon and Elkan's 'Dyadic Prediction Using a Latent Feature Log-Linear Model' paper. First I intended to improve Ted's initial implementation, then I started a separate implementation to keep the code integrable to Taste in the very beginning. What I mean is, those approaches are really similar. The code is already integrated, and may be one of the options of many recommenders to a user. Finally, I am volunteer to keep the code integrated and working, improve it upon suggestions, and provide a documentation on usage and details. Why I don't consider to start a separate project rather than offer to contribute to Mahout is; I am familiar with Mahout library, the code already depends on Mahout, and the goal for the project is to be used by people. Mahout already attracts a plenty of users and developers, which means the code is used by more people, and with reviews it may be fixed and improved faster. Regards > > On Sun, Sep 9, 2012 at 7:01 AM, Gokhan Capan <[email protected]> wrote: > > > On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <[email protected]> > > wrote: > > > > > This sounds pretty exciting. Beyond that, it is hard to say much. > > > > > > Can you say a bit more about how you would see introducing the code > into > > > Mahout? > > > > > > > Ted, I've forked apache/mahout at github, and I will merge the library > into > > mahout. I believe in a week I will be able to add documentation and > mahout > > jobs for experiments and start submitting patches to JIRA. > > > -- Gokhan
