I've stuck up a little slideset with some thoughts on what we could be doing to get tighter engagement between universities and the Hadoop codebase
http://www.slideshare.net/steve_l/hadoop-and-universities

There are various levels of engagement

- teaching MapReduce and other datacentre-scale coding techiques.
This could be done by getting involved with the undergrad/grad teachers, helping with the lectures and the coursework. Remember , most universities do welcome outside lecturers giving guest talks to the students.

- encourage scientific computation to be done on top of Hadoop. There's a small problem there, cluster time, which means that people with access to datacentres with CPU and storage to spare need to lift a hand here. Or we help get Hadoop up on the existing clusters the physicists run. [ see http://www.slideshare.net/steve_l/hadoop-hep for some plans here]

- encourage people doing maths and CS work to do it on Hadoop.
The plugins for scheduling and placement are a good low-risk place we can get people involved, the other area of interest is stuff on top, such as graphs and other leading-edge stuff.

- get some of the people doing work in this area to talk at apachecon. Come on, you want to know how do debug a particle accelerator experiment that generates 1PB/month of data but which only 50 events/year are actually interesting.

Term-time is rushing up, so now might be a good time for Apache to get ready to bring the universities on board.

I'm going to start by proposing we create a hadoop-research list, for people doing more researchy stuff on top of and inside hadoop. we can then start identifying who is interested in this area, which academic and industrial people are involved, where they are, start meeting up. We had a good little workshop at Bristol University last month http://wiki.apache.org/hadoop/BristolHadoopWorkshop ; it was good to get together the different groups.

Thoughts?

-steve

Reply via email to