Hey Chris,
has anyone used Hadoop as an external yet?
I looked at using Disco as an external map reduce (Nokia's erlang/ python map reduce framework). There were some issues with how to pass data into the map functions (ideally Disco would get everything up front, instead of a doc at a time, which I suspect you'd want to do with Hadoop too). One idea I had was basically ignoring what Couch sent the external and having the remote workers pulling things from _all_docs, but that's not going to be very efficient or nice. I didn't really get a chance to finish anything off though, so maybe there's something blindingly obvious you could do. Maybe being able to configure a pipeline size would work here (I remember discussions about pipelining to improve performance on JIRA). Hopefully I'll get some time to play more with Hadoop/Disco and Couch in the not too distant future (at the moment I'm bogged down with project management guff).
I vaguely remember Mike saying someone in Cloudant had done something similar with more success...
Cheers Simon
