FYI.  Following on a brief discussion Tuesday at the data mining session....

Google & IBM giving students a distributed systems lab using
Hadoop<http://feeds.feedburner.com/%7Er/oreilly/radar/atom/%7E3/167584952/google_ibm_give.html>

Posted: 09 Oct 2007 04:07 PM CDT

By Jesse Robbins

[image: hadoop-logo.jpg] <http://lucene.apache.org/hadoop/> Google
<http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html> & IBM
have partnered <http://www-03.ibm.com/press/us/en/pressrelease/22414.wss> to
give university students hands-on experience developing software for
large-scale distributed systems. This initiative focuses on parallel
processing for large data sets using Hadoop<http://lucene.apache.org/hadoop/>,
an open source implementation of Google's
MapReduce<http://labs.google.com/papers/mapreduce.html>.
(See Tim's earlier post about Yahoo &
Hadoop<http://radar.oreilly.com/archives/2007/08/yahoos_bet_on_h.html>)


"The goal of this initiative is to improve computer science students'
knowledge of highly parallel computing practices to better address the
emerging paradigm of large-scale distributed computing. IBM and Google are
teaming up to provide hardware, software and services to augment university
curricula and expand research horizons. With their combined resources, the
companies hope to lower the financial and logistical barriers for the
academic community to explore this emerging model of computing."

The project currently includes the University of Washington, Carnegie-Mellon
University, MIT, Stanford, UC Berkeley and the University of Maryland.
Students in participating classes will have access to a dedicated cluster of
"several hundred computers" running Linux under XEN
virtualization<http://www.xensource.com/Pages/default.aspx>.
The project is expected to expand to thousands of processors and eventually
be open to researchers and students at other institutions.

As part of this effort, Google and the University of Washington have
released a Creative Commons licensed curriculum to help teach distributed
systems concepts and
techniques<http://code.google.com/edu/content/parallel.html>.
IBM is also providing Hadoop plug-ins for
Eclipse<http://www.alphaworks.ibm.com/tech/mapreducetools>.


*Note: *You can also build similar systems using Hadoop with Amazon
EC2<http://wiki.apache.org/lucene-hadoop/AmazonEC2>.
Tom White recently posted an excellent
guide<http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&categoryID=112>and
Powerset
has been using this in
production<http://www.royans.net/arch/2007/09/13/scaling-powerset-using-amazons-ec2-and-s3/>for
quite some time.

--tj
-- 
==========================================
J. T. Johnson
Institute for Analytic Journalism -- Santa Fe, NM USA
www.analyticjournalism.com
505.577.6482(c)                                 505.473.9646(h)
http://www.jtjohnson.com                 [EMAIL PROTECTED]

"You never change things by fighting the existing reality.
To change something, build a new model that makes the
existing model obsolete."
                                                   -- Buckminster Fuller
==========================================
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org

Reply via email to