The recent O'Reilly book " Programming Collective Intelligence" might be an interesting resource for problems and data sources as well.
On 11/16/07 6:21 PM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote: > Bruce, > > I helped design and teach an undergrad course based on Hadoop last year. > Along with some folks at Google, we then made the resources available > together to distribute to other universities and the public at large > (via Creative Commons license, actually). > > All the materials are available online here: > http://code.google.com/edu/content/parallel.html > (lecture notes, labs, and even video lectures.) > > It includes suggested lab activities. Good free data sets you can > download include Netflix prize data and a copy of the wikipedia corpus. > Of course, you can set up Nutch and do your own web crawl too. > > We also highly endorse the Amazon EC2 idea for doing your own labs :) > > Best of luck, > - Aaron > > > > Edward Bruce Williams wrote: >> Hello >> >> >> >> I am a student doing an independent study project investigating the >> possibility of teaching large scale computing on a small scale budget. Th >> >> >> >> My thought is to use available Open Source ( Hadoop) and Creative Commons >> and other materials as the text. A student could then do significant >> computing on Amazon for the cost of what they would usually pay for a >> textbook. I have convinced an agency of the state of California that paying >> for computer time for a CS student is "like buying a textbook or calculator >> for a math student", so "so far so good." >> >> >> >> I am asking if anyone has some largish data sets, preferably on Amazon, we >> could use for class projects to contact me off list. >> >> >> >> Thanks, >> >> >> >> Bruce Williams >> >> >> >>
