[
https://issues.apache.org/jira/browse/MAHOUT-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014804#comment-14014804
]
Maciej Kula commented on MAHOUT-1567:
-------------------------------------
Correct. It is effectively an online dimensionality reduction technique.
We start with a set of n vectors (datapoints, rows in a matrix), and then look
for a smaller number of vectors m such that the original vectors can be well
approximated by linear combinations of the m new vectors. In the parlance of
dictionary learning, the m vectors are called dictionary atoms, and the linear
combinations are called (sparse) codes.
The idea of learning a dictionary is that, instead of picking a predefined set
of vectors, we fit the dictionary vectors to the data using an SGD-like process.
I can amend the issue title to reflect this a little bit better?
> Add online sparse dictionary learning
> -------------------------------------
>
> Key: MAHOUT-1567
> URL: https://issues.apache.org/jira/browse/MAHOUT-1567
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Reporter: Maciej Kula
>
> I have recently implemented a sparse online dictionary learning algorithm,
> with an emphasis on learning very high-dimensional and very sparse
> dictionaries. It is based on J. Mairal et al 'Online Dictionary Learning for
> Sparse Coding' (http://www.di.ens.fr/willow/pdfs/icml09.pdf). It's an online
> variant of low-rank matrix factorization, suitable for sparse binary matrices
> (such as implicit feedback matrices).
> I would be very happy to bring this up to the Mahout standard and contribute
> to the main codebase --- is this something you would in principle be
> interested in having?
> The code (as well as some examples) are here:
> https://github.com/maciejkula/dictionarylearning
--
This message was sent by Atlassian JIRA
(v6.2#6252)