Yes.  This should be relatively easy to build in Mahout.

There are several decent courses of action:

1) use the streaming k-means to reduce the data to a sketch of the original
data and then apply an in-memory k-svd algorithm to that.  This could be
really, really fast and has a reasonable chance of working well.

2) pull the streaming k-means apart and use the approximate search
mechanisms in there to implement a new version of k-svd.

3) build an iterative implementation on top of Spark or Giraph using Mahout
vectors.

One reservation that I might have is that if your data are dense, you might
have better performance using something like JBlas.  Mahout works well for
sparse data, but is a bit sub-par on dense performance.


On Tue, Oct 1, 2013 at 7:34 AM, Niederberger Thomas ([email protected]) <
[email protected]> wrote:

> Hello everyone,
> I'm currently working on my master thesis in compressed sensing for video.
> Part of my project might be to learn a dictionary from a large collection
> of videos. The typical algorithm to use for this task is called K-SVD.
> K-SVD is very similar to K-Means but there are two key differences:
>  - A data point is assigned to several 'clusters' (not one)
>  - Instead of finding the cluster by taking the mean of all data points
> the cluster is found by finding the first principal component using an SVD
> (hence the name).
> The original reference for the algorithm is "K-SVD: An Algorithm for
> Designing Overcomplete Dictionaries for Sparse Representation" from M.
> Aharon.
>
> It seems no one has implemented this algorithm yet on Mahout.
> Since I have no experience with Mahout/Hadoop I wondered if it would be
> difficult to implement this based on the available implementation of
> K-Means?
> Is anybody interested in this and could point me towards the right
> direction for an implementation?
>
> Best,
> Thomas
>

Reply via email to