Yes. This should be relatively easy to build in Mahout. There are several decent courses of action:
1) use the streaming k-means to reduce the data to a sketch of the original data and then apply an in-memory k-svd algorithm to that. This could be really, really fast and has a reasonable chance of working well. 2) pull the streaming k-means apart and use the approximate search mechanisms in there to implement a new version of k-svd. 3) build an iterative implementation on top of Spark or Giraph using Mahout vectors. One reservation that I might have is that if your data are dense, you might have better performance using something like JBlas. Mahout works well for sparse data, but is a bit sub-par on dense performance. On Tue, Oct 1, 2013 at 7:34 AM, Niederberger Thomas ([email protected]) < [email protected]> wrote: > Hello everyone, > I'm currently working on my master thesis in compressed sensing for video. > Part of my project might be to learn a dictionary from a large collection > of videos. The typical algorithm to use for this task is called K-SVD. > K-SVD is very similar to K-Means but there are two key differences: > - A data point is assigned to several 'clusters' (not one) > - Instead of finding the cluster by taking the mean of all data points > the cluster is found by finding the first principal component using an SVD > (hence the name). > The original reference for the algorithm is "K-SVD: An Algorithm for > Designing Overcomplete Dictionaries for Sparse Representation" from M. > Aharon. > > It seems no one has implemented this algorithm yet on Mahout. > Since I have no experience with Mahout/Hadoop I wondered if it would be > difficult to implement this based on the available implementation of > K-Means? > Is anybody interested in this and could point me towards the right > direction for an implementation? > > Best, > Thomas >
