Matrix, Hadoop, text, ... (was Re: Hi)

Paul Elschot Tue, 19 Feb 2008 14:31:21 -0800

One advantage of sparse matrices is that they can easily be sent
around a Hadoop cluster in their complete form.


Would that still leave a need to distribute sparse matrix operations
for a single matrix?

I mean, I just ran into svmlin, it uses a sparse matrix, is meant
for text classification problems, is less than 2000 lines of .cpp code,
and works pretty fast at a few first attempts:
http://people.cs.uchicago.edu/~vikass/svmlin.html
svmlin has a gpl licence, but it is easy to use in binary form,
one only needs to  wrap a class around a process executing
the svmlin program.

I'm probably missing something, this sounds too easy.

Regards,
Paul Elschot


Op Tuesday 19 February 2008 21:42:25 schreef Grant Ingersoll:
> My gut feeling is that we are going to have to build our own, but I
> don't know for sure yet.  Just seems like it would be a lot more work
> to try to bring someone else's library into Hadoop than to just build
> what we need in Hadoop, but I am open to suggestions.   Plus, I am
> biased towards fewer dependencies.  Makes it easier for people to
> adopt us and easier manage, at the cost of some extra development
> work.  Besides, no one sounds particularly enthusiastic about what is
> available.
>
> -Grant
>
> On Feb 19, 2008, at 3:16 PM, Ted Dunning wrote:
> > I have been unable to determine whether the hadoop matrix is real
> > or not.
> > From discussions, it definitely isn't sparse.
> >
> > Sparsity is absolutely a must and not just for text.  Really huge
> > machine
> > learning tends toward sparsity, regardless of area.
> >
> > On 2/19/08 12:13 PM, "Jason Rennie" <[EMAIL PROTECTED]> wrote:
> >> On Mon, Feb 18, 2008 at 8:43 PM, Grant Ingersoll
> >> <[EMAIL PROTECTED]>
> >>
> >> wrote:
> >>> yeah, we have had a few discussions on this.   There is some
> >>> support in Hadoop already for Matrix calculations via a donation,
> >>> but I don't
> >>> know that anyone has dug in too deep with it yet.  It may be the
> >>> case
> >>> that we start with something, and then decide to go with
> >>> something else as we get more running time together on this
> >>> stuff.
> >>
> >> Is the hadoop matrix lib sparse?  I think I took a quick look and
> >> didn't
> >> find any indication of such.  If a significant application area of
> >> mahout is
> >> text, sparsity is a must.  Even non-text domains, such as
> >> collaborative
> >> filtering, often require sparse representation in order to scale
> >> to medium-sized data sets.  But, yeah, understood that it's good
> >> to hit the
> >> ground running, see how far we can get and make changes as
> >> necessary/useful
> >>
> >> :)
> >>>
> >>> Read only access is available via:
> >>> svn co http://svn.apache.org/repos/asf/lucene/mahout/trunk
> >>
> >> Thanks.  I was trying to checkout one directory too high.
> >>
> >> Jason

Matrix, Hadoop, text, ... (was Re: Hi)

Reply via email to