Let's see if I succeed in liberating it. On Sun, Oct 4, 2009 at 2:49 PM, Ted Dunning <[email protected]> wrote:
> The algorithm in question is a very interesting and *very* useful one for > language processing and understanding. As Benson says, it is non-trivial > to > implement. > > I would be happy to kibitz on a re-implementation as usual, but have > negative time available for real coding. Is anybody else available? > > On Sun, Oct 4, 2009 at 5:25 AM, Benson Margulies <[email protected] > >wrote: > > > Isabel, > > > > Let me give you the backrground of this. The code in question is a > > second-generation implementation (by the same author) of this very hard > > algorithm.The author is a professional who now works for Basis (where I > > work) who was working, at the time, at an academic institution. I have > > opened negotiations with the relevant professor to see if I can't find a > > way > > to make it open source. > > > > Last year, before I knew anything about this implementation, I built one. > > Mine, written in C++, can turn 111921 words with 8899115 distinct bigrams > > into 1000 clusters in about 14 hours on a 4-core system. While I'm a > fairly > > experienced C++ code tuner, I am not much of a mathematician, and so I > > missed some mathematical shortcuts. > > > > I used OpenMP to get some parallelization into the code. However, it > seems > > to me that some major rethinking would be required to cast the problem in > a > > form where Java and Hadoop could do anything with it. However, I could > just > > be thick-headed. > > > > If I succeed in getting the code to open source, I'd be very interested > in > > seeing if any of you hadoop-artists can suggest an approach that could > > produce comparable or better clock time without having to apply a > gigantic > > amount of hardware. GIven such an approach, the author or I might find > > interesting to try an implementation an a contribution to Mahout. > > > > --benson > > > > > > On Sun, Oct 4, 2009 at 7:54 AM, Isabel Drost <[email protected]> wrote: > > > > > On Saturday 03 October 2009 18:45:12 Sean Owen wrote: > > > > Let me however revive my suggestion that Mahout include a 'sandbox' > > > > module of sorts to host anything at all. This neatly allows for > > > > incorporation of anything, in any state, without confusing users as > to > > > > what should be expected of Mahout 'proper', which should be a > > > > reasonably high bar come version 1.0. > > > > > > +1 Until that is realized, I would suggest to not scare away people > just > > > because they used the "wrong" programming language/lib/... > > > > > > Benson, do you think there might be a tiny chance that you can motivate > > the > > > student to contribute his implementation as a JIRA issue and work > > together > > > with the community to make it run on Hadoop? Does that even make sense > > for > > > the algorithm implemented? > > > > > > Isabel > > > > > > > > > -- > > > |\ _,,,---,,_ Web: <http://www.isabel-drost.de> > > > /,`.-'`' -. ;-;;,_ > > > |,4- ) )-,_..;\ ( `'-' > > > '---''(_/--' `-'\_) (fL) IM: <xmpp://[email protected]> > > > > > > > > > > > > -- > Ted Dunning, CTO > DeepDyve >
