Re: [jira] Commented: (COUCHDB-623) File format for views is space and time inefficient - use a better one

Simon Metson Thu, 14 Jan 2010 04:53:21 -0800

Hey Chris,

has anyone used Hadoop as an external yet?

I looked at using Disco as an external map reduce (Nokia's erlang/python map reduce framework). There were some issues with how to passdata into the map functions (ideally Disco would get everything upfront, instead of a doc at a time, which I suspect you'd want to dowith Hadoop too). One idea I had was basically ignoring what Couchsent the external and having the remote workers pulling things from_all_docs, but that's not going to be very efficient or nice. I didn'treally get a chance to finish anything off though, so maybe there'ssomething blindingly obvious you could do. Maybe being able toconfigure a pipeline size would work here (I remember discussionsabout pipelining to improve performance on JIRA). Hopefully I'll getsome time to play more with Hadoop/Disco and Couch in the not toodistant future (at the moment I'm bogged down with project managementguff).

I vaguely remember Mike saying someone in Cloudant had done somethingsimilar with more success...

Cheers
Simon

Re: [jira] Commented: (COUCHDB-623) File format for views is space and time inefficient - use a better one

Reply via email to