Gokhan Capan created MAHOUT-1069:
------------------------------------

             Summary: Multi-target, side-info aware, SGD-based recommender 
algorithms, examples, and tools to run
                 Key: MAHOUT-1069
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1069
             Project: Mahout
          Issue Type: Improvement
          Components: CLI, Collaborative Filtering
    Affects Versions: 0.8
            Reporter: Gokhan Capan
            Assignee: Sean Owen


Upon our conversations on dev-list, I would like to state that I have completed 
the merge of the recommender algorithms that is mentioned in 
http://goo.gl/fh4d9 to mahout. 

These are a set of learning algorithms for matrix factorization based 
recommendation, which are capable of:

* Recommending multiple targets:
*# Numerical Recommendation with OLS Regression
*# Binary Recommendation with Logistic Regression
*# Multinomial Recommendation with Softmax Regression
*# Ordinal Recommendation with Proportional Odds Model

* Leveraging side info in mahout vector format where available
*# User side information
*# Item side information
*# Dynamic side information (side info at feedback moment, such as proximity, 
day of week etc.)

* Online learning

Some command-line tools are provided as mahout jobs, for pre-experiment 
utilities and running experiments.

Evaluation tools for numerical and categorical recommenders are added.

A simple example for Movielens-1M data is provided, and it achieved pretty good 
results (0.851 RMSE in a randomly generated test data after some validation to 
determine learning and regularization rates on a separate validation data)

There is no modification in the existing Mahout code, except the added lines in 
driver.class.props for command-line tools. However, that became a huge patch 
with dozens of new source files.

These algorithms are highly inspired from various influential Recommender 
System papers, especially Yehuda Koren's. For example, the Ordinal model is 
from Koren's OrdRec paper, except the cuts are not user-specific but global.

Left for future:
# The core algorithms are tested, but there probably exists some parts those 
tests do not cover. I saw many of those in action without problem, but I am 
going to add new tests regularly.
# Not all algorithms have been tried on appropriate datasets, and they may need 
some improvement. However, I use the algorithms also for my M.Sc. thesis, which 
means I will eventually submit more experiments. As the experimenting 
infrastructure exists, I believe community may provide more experiments, too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to