[ 
https://issues.apache.org/jira/browse/MAHOUT-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014804#comment-14014804
 ] 

Maciej Kula commented on MAHOUT-1567:
-------------------------------------

Correct. It is effectively an online dimensionality reduction technique.

We start with a set of n vectors (datapoints, rows in a matrix), and then look 
for a smaller number of vectors m such that the original vectors can be well 
approximated by linear combinations of the m new vectors. In the parlance of 
dictionary learning, the m vectors are called dictionary atoms, and the linear 
combinations are called (sparse) codes.

The idea of learning a dictionary is that, instead of picking a predefined set 
of vectors, we fit the dictionary vectors to the data using an SGD-like process.

I can amend the issue title to reflect this a little bit better?

> Add online sparse dictionary learning
> -------------------------------------
>
>                 Key: MAHOUT-1567
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1567
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Maciej Kula
>
> I have recently implemented a sparse online dictionary learning algorithm, 
> with an emphasis on learning very high-dimensional and very sparse 
> dictionaries. It is based on J. Mairal et al 'Online Dictionary Learning for 
> Sparse Coding' (http://www.di.ens.fr/willow/pdfs/icml09.pdf). It's an online 
> variant of low-rank matrix factorization, suitable for sparse binary matrices 
> (such as implicit feedback matrices).
> I would be very happy to bring this up to the Mahout standard and contribute 
> to the main codebase --- is this something you would in principle be 
> interested in having?
> The code (as well as some examples) are here: 
> https://github.com/maciejkula/dictionarylearning



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to