Let's see if I succeed in liberating it.

On Sun, Oct 4, 2009 at 2:49 PM, Ted Dunning <[email protected]> wrote:

> The algorithm in question is a very interesting and *very* useful one for
> language processing and understanding.  As Benson says, it is non-trivial
> to
> implement.
>
> I would be happy to kibitz on a re-implementation as usual, but have
> negative time available for real coding.  Is anybody else available?
>
> On Sun, Oct 4, 2009 at 5:25 AM, Benson Margulies <[email protected]
> >wrote:
>
> > Isabel,
> >
> > Let me give you the backrground of this. The code in question is a
> > second-generation implementation (by the same author) of this very hard
> > algorithm.The author is  a professional who now works for Basis (where I
> > work) who was working, at the time, at an academic institution. I have
> > opened negotiations with the relevant professor to see if I can't find a
> > way
> > to make it open source.
> >
> > Last year, before I knew anything about this implementation, I built one.
> > Mine, written in C++, can turn 111921 words with 8899115 distinct bigrams
> > into 1000 clusters in about 14 hours on a 4-core system. While I'm a
> fairly
> > experienced C++ code tuner, I am not much of a mathematician, and so I
> > missed some mathematical shortcuts.
> >
> > I used OpenMP to get some parallelization into the code. However, it
> seems
> > to me that some major rethinking would be required to cast the problem in
> a
> > form where Java and Hadoop could do anything with it. However, I could
> just
> > be thick-headed.
> >
> > If I succeed in getting the code to open source, I'd be very interested
> in
> > seeing if any of you hadoop-artists can suggest an approach that could
> > produce comparable or better clock time without having to apply a
> gigantic
> > amount of hardware. GIven such an approach, the author or I might find
> > interesting to try an implementation an a contribution to Mahout.
> >
> > --benson
> >
> >
> > On Sun, Oct 4, 2009 at 7:54 AM, Isabel Drost <[email protected]> wrote:
> >
> > > On Saturday 03 October 2009 18:45:12 Sean Owen wrote:
> > > > Let me however revive my suggestion that Mahout include a 'sandbox'
> > > > module of sorts to host anything at all. This neatly allows for
> > > > incorporation of anything, in any state, without confusing users as
> to
> > > > what should be expected of Mahout 'proper', which should be a
> > > > reasonably high bar come version 1.0.
> > >
> > > +1 Until that is realized, I would suggest to not scare away people
> just
> > > because they used the "wrong" programming language/lib/...
> > >
> > > Benson, do you think there might be a tiny chance that you can motivate
> > the
> > > student to contribute his implementation as a JIRA issue and work
> > together
> > > with the community to make it run on Hadoop? Does that even make sense
> > for
> > > the algorithm implemented?
> > >
> > > Isabel
> > >
> > >
> > > --
> > >  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
> > >  /,`.-'`'    -.  ;-;;,_
> > >  |,4-  ) )-,_..;\ (  `'-'
> > > '---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[email protected]>
> > >
> > >
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Reply via email to