On Mon, Aug 2, 2010 at 2:54 PM, Paul Davis <[email protected]>wrote:
> On Mon, Aug 2, 2010 at 5:34 PM, Mikeal Rogers <[email protected]> > wrote: > >> > >> For the first point about CommonJS modules in Map/Reduce views I'd say > >> the goal is fine, but I don't understand how or why you'd want that > >> hash to happen in JavaScript. Unless I'm mistaken, aren't the import > >> statements executable JS? As in, is there any requirement that you > >> couldn't import a module inside your map function? In which case, JS > >> can't really hash all imported modules until after all possible code > >> paths have been traced? > >> > >> I think a better answer would be to allow commonjs modules, but only > >> in some name-space of the design document. (IIRC, the other functions > >> can pull from anywhere, but that would make all design doc updates > >> trigger view regeneration) Then Erlang just loads this namespace and > >> anything that could be imported is included in the hash some how (hash > >> of sorted hashes or some such). > >> > > > > This is an interesting idea and I think I like it more than my original > > proposal. My fear with the original proposal was that it might be opaque > to > > most users what will invalidate their views if we start doing fancy > > invalidation on modules they use. If we re-scope or restrict the module > > support to an attribute that would make it very clear that changes to > those > > modules will invalidate the view. > > > > > >> > >> Batching docs across the I/O might not give you as much of a > >> performance improvement as you'd think. There's a pretty nasty time > >> explosion on parsing larger JSON documents in some of the various > >> parsers I've tried. I've noticed this on various Pure erlang parsers, > >> but I wouldn't be suprised if the the json.js suffered as well. And in > >> this, I mean, that parsing a one megabyte document might be quite a > >> bit slower than parsing many smaller documents. So simply wrapping > >> things in an array could be bad. > >> > > > > The new native C parser in JavaScript is fine with anything this size and > I > > believe Damien just wrote an evented JSON parser which should make this > more > > acceptable on the client side. One good idea I think jchris has was > instead > > of having a number of documents threshold was to have a byte length > > restriction on the batch we send to the view server. > > Yeah, the new embedded JSON parser should be fine as long as we can > motivate people to upgrade to a recent JavaScript library. My > experience is more related to the Erlang side as that what I've done > all of my comparisons against. I haven't done any testing on the > streaming parser but it'd be interesting to see how it behaves in > relation to doc size input. > Jason's full build tarballs will hopefully help with that but yes, i'm always telling people to grab a newer SM. > > > The I/O time for large amounts of small documents is larger than you > would > > expect. I ran some tests a while back and there was more time spent in > stdio > > for simple map/reduce operations than there was in processing on the view > > server. > > Did you run the experiment to try batching the updates across the > wire? I'm not surprised that the transfer can take longer than the > computation, but I'm not sure how much benefit you'd get from batching > 100 or so docs. I reckon there'd be some, I just don't have any idea > how much. > I did run the tests. It's not a size issue it's a delay in readline() and flush calls that over thousands of small documents end up being larger than the transfer time. I actually implemented this in the view server and have it in a branch somewhere on github and it measured well in my test. There just was never the accompanying erlang work to make it happen. > > > Of course the most time spent on view generation is still writing to the > > btree but that performance has already increased quite a bit so we're > > looking for other places we can optimize. > > > > > >> > >> An alternative that I haven't seen anywhere else in this thread was an > >> idea to tag every message passed to the view engine with a uuid. Then > >> people can do all sorts of fancy things with the view engine like > >> async processing and so on and such forth. The downside being that the > >> saturday afternoon implementation of the view engine in language X now > >> takes both saturday and sunday afternoon. > >> > > > > So, this gets dicey really fast. I want the external process protocol to > go > > non-blocking and support this uuid style communication but I'm really > > skeptical of it in the view server. > > > > The view server should do pure functional transforms, allowing it to do > I/O > > means that is no longer true. It's also not just as simple as stamping > the > > protocol with a uuid because erlang still needs to load balance any > number > > of external processes. When the view server no longer solely blocks on > > processing it becomes much harder to achieve that load balancing. > > > > Well, the original proposal was that if we do an asynchronous message > passing thing between with the view server then Erlang doesn't do the > load balancing, the view server could become threaded or be use a > pre-fork server model and do the load balancing across multiple cores > itself. > > But you reminded me of the point that convinced me to not experiment > with the approach. If something causes the view engine to crash you > can end up affecting a lot of things that are unrelated. Ie, someone > gets a 500 on a _show function because a different app had a bug in > its view handling code that happened to be reindexing. With the > current model the effects of errors are more isolated. > > > > >> > >> Apologies for missing this thread earlier. Better late than never I > guess. > >> > >> Paul Davis > >> > > >
