One advantage of sparse matrices is that they can easily be sent around a Hadoop cluster in their complete form.
Would that still leave a need to distribute sparse matrix operations for a single matrix? I mean, I just ran into svmlin, it uses a sparse matrix, is meant for text classification problems, is less than 2000 lines of .cpp code, and works pretty fast at a few first attempts: http://people.cs.uchicago.edu/~vikass/svmlin.html svmlin has a gpl licence, but it is easy to use in binary form, one only needs to wrap a class around a process executing the svmlin program. I'm probably missing something, this sounds too easy. Regards, Paul Elschot Op Tuesday 19 February 2008 21:42:25 schreef Grant Ingersoll: > My gut feeling is that we are going to have to build our own, but I > don't know for sure yet. Just seems like it would be a lot more work > to try to bring someone else's library into Hadoop than to just build > what we need in Hadoop, but I am open to suggestions. Plus, I am > biased towards fewer dependencies. Makes it easier for people to > adopt us and easier manage, at the cost of some extra development > work. Besides, no one sounds particularly enthusiastic about what is > available. > > -Grant > > On Feb 19, 2008, at 3:16 PM, Ted Dunning wrote: > > I have been unable to determine whether the hadoop matrix is real > > or not. > > From discussions, it definitely isn't sparse. > > > > Sparsity is absolutely a must and not just for text. Really huge > > machine > > learning tends toward sparsity, regardless of area. > > > > On 2/19/08 12:13 PM, "Jason Rennie" <[EMAIL PROTECTED]> wrote: > >> On Mon, Feb 18, 2008 at 8:43 PM, Grant Ingersoll > >> <[EMAIL PROTECTED]> > >> > >> wrote: > >>> yeah, we have had a few discussions on this. There is some > >>> support in Hadoop already for Matrix calculations via a donation, > >>> but I don't > >>> know that anyone has dug in too deep with it yet. It may be the > >>> case > >>> that we start with something, and then decide to go with > >>> something else as we get more running time together on this > >>> stuff. > >> > >> Is the hadoop matrix lib sparse? I think I took a quick look and > >> didn't > >> find any indication of such. If a significant application area of > >> mahout is > >> text, sparsity is a must. Even non-text domains, such as > >> collaborative > >> filtering, often require sparse representation in order to scale > >> to medium-sized data sets. But, yeah, understood that it's good > >> to hit the > >> ground running, see how far we can get and make changes as > >> necessary/useful > >> > >> :) > >>> > >>> Read only access is available via: > >>> svn co http://svn.apache.org/repos/asf/lucene/mahout/trunk > >> > >> Thanks. I was trying to checkout one directory too high. > >> > >> Jason
