I don't think I can contribute much to the algorithms themselves, but I've got a 12+ node Hadoop cluster and I'd be keen on helping to run them on it.
Jeff Eastman -----Original Message----- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Sunday, January 27, 2008 10:39 AM To: [email protected] Subject: Machine Resources [was Re: Confluence Wiki] On Jan 25, 2008, at 11:43 PM, Mason Tang wrote: > > Also, is there any chance we'll be able to get a small (and I mean > small) cluster to run some tests on? Local Hadoop testing only gets > you so far... Yeah, this type of thing is perennially a problem. I think we will have to beg/borrow/steal (just kidding on the steal). I think the key will be to get local stuff running and then start looking around for resources. Amazon EC2 is an obvious place, but short of someone donating time on it, I am not sure how we would come about it. I don't know enough about Apache's infrastructure to know whether there is enough to cobble together. Committers can get access to Lucene's zones (virtual server) machine. I know that it is a problem that Nutch faces as well, presumably. Hadoop, luckily, is fairly well supported by Yahoo! and other companies with machine access. My hope is if we can show some promise with code that runs well on single or small clusters that maybe we can garner some interest from bigger supporters. And, of course, most machines are multi-core these days and Hadoop can leverage that, as I understand it. Perhaps, if we can organize it and make sure it is secure, we can try to figure out a way for the various people here to pull together our resources. Just thinking out loud... -Grant
