The recent O'Reilly book " Programming Collective Intelligence" might be an
interesting resource for problems and data sources as well.


On 11/16/07 6:21 PM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote:

> Bruce,
> 
> I helped design and teach an undergrad course based on Hadoop last year.
> Along with some folks at Google, we then made the resources available
> together to distribute to other universities and the public at large
> (via Creative Commons license, actually).
> 
> All the materials are available online here:
> http://code.google.com/edu/content/parallel.html
> (lecture notes, labs, and even video lectures.)
> 
> It includes suggested lab activities. Good free data sets you can
> download include Netflix prize data and a copy of the wikipedia corpus.
> Of course, you can set up Nutch and do your own web crawl too.
> 
> We also highly endorse the Amazon EC2 idea for doing your own labs :)
> 
> Best of luck,
> - Aaron
> 
> 
> 
> Edward Bruce Williams wrote:
>> Hello
>> 
>>  
>> 
>> I am a student doing an independent study project investigating the
>> possibility of teaching large scale computing on a small scale budget.  Th
>> 
>>  
>> 
>> My thought is to use available Open Source ( Hadoop) and Creative Commons
>> and other materials as the text.  A student could then do significant
>> computing on Amazon for the cost of what they would usually pay for a
>> textbook.  I have convinced an agency of the state of California that paying
>> for computer time for a CS student is "like buying a textbook or calculator
>> for a math student", so "so  far so good."
>> 
>>  
>> 
>> I am asking if anyone has some largish data sets, preferably on Amazon, we
>> could use for  class projects to contact me off list.
>> 
>>  
>> 
>> Thanks,
>> 
>>  
>> 
>> Bruce Williams 
>> 
>>  
>> 
>> 

Reply via email to