Re: SGD Based Recommender Contribution Proposal

Gokhan Capan Fri, 12 Oct 2012 00:05:10 -0700

On Sun, Sep 9, 2012 at 10:27 PM, Ted Dunning <[email protected]> wrote:

> Great.
>
> If the update has a huge impact on existing code, can you break it into
> manageable pieces?
>
> If it is just an addition, having a big blob of stuff is probably fine.

Now it is integrable to Mahout, and can work with Mahout's existing
Recommender interface. It does not modify any existing code, except a
couple of additional lines in driver.class.props, which define a few
commandline utilities I find useful while experimenting a recommender.

By the way I found a few minor bugs, updated the patch.

Did you have any chance to look at this?

Secondly, I would like to up the thread to trigger a discussion on this.
Sean raised some concerns on the patch. (available in the JIRA page as a
comment)

Quoting Sean's comment:
"I imagine this is all great work. As I commented off-list, it is a big
enough and even different enough beast that it feels like it should be a
separate project. The Mahout code base is already uneven and sprawling and
I think this would exacerbate that – and not generate much "synergy" worth
the effort of integration."

I understand all of these, and want to provide a general response to
possibly clarify some of points Sean made.

Basically it adds an online version of existing Mahout recommendation
capabilities. Learning MF based recommender with Alternating Least Squares
already exists in Mahout, and this is the SGD based version. The different
targets approach is just a set of wrappers on those linear models. (Same as
Generalized Linear Models approach) Adding side info is optional, which may
be beneficial when there is a cold-start issue.

Additionally, the OnlineFactorizationRecommender extends the
AbstractRecommender, and the FactorizationAwareDataModel is a Mahout
DataModel composed with a base DataModel that is capable of adding new
ratings.

Besides all these, I remember the initiative Ted started following Menon
and Elkan's 'Dyadic Prediction Using a Latent Feature Log-Linear Model'
paper. First I intended to improve Ted's initial implementation, then I
started a separate implementation to keep the code integrable to Taste  in
the very beginning. What I mean is, those approaches are really similar.

The code is already integrated, and may be one of the options of many
recommenders to a user. Finally, I am volunteer to keep the code integrated
and working, improve it upon suggestions, and provide a documentation on
usage and details.

Why I don't consider to start a separate project rather than offer to
contribute to Mahout is; I am familiar with Mahout library, the code
already depends on Mahout, and the goal for the project is to be used by
people. Mahout already attracts a plenty of users and developers, which
means the code is used by more people, and with reviews it may be fixed and
improved faster.

Regards

>
> On Sun, Sep 9, 2012 at 7:01 AM, Gokhan Capan <[email protected]> wrote:
>
> > On Fri, Sep 7, 2012 at 12:48 AM, Ted Dunning <[email protected]>
> > wrote:
> >
> > > This sounds pretty exciting.  Beyond that, it is hard to say much.
> > >
> > > Can you say a bit more about how you would see introducing the code
> into
> > > Mahout?
> > >
> >
> > Ted, I've forked apache/mahout at github, and I will merge the library
> into
> > mahout. I believe in a week I will be able to add documentation and
> mahout
> > jobs for experiments and start submitting patches to JIRA.
> >
>
-- 
Gokhan

Re: SGD Based Recommender Contribution Proposal

Reply via email to