Re: Proposal for changes in view server/protocol

Mikeal Rogers Mon, 02 Aug 2010 15:10:34 -0700

On Mon, Aug 2, 2010 at 2:54 PM, Paul Davis <[email protected]>wrote:


> On Mon, Aug 2, 2010 at 5:34 PM, Mikeal Rogers <[email protected]>
> wrote:
> >>
> >> For the first point about CommonJS modules in Map/Reduce views I'd say
> >> the goal is fine, but I don't understand how or why you'd want that
> >> hash to happen in JavaScript. Unless I'm mistaken, aren't the import
> >> statements executable JS? As in, is there any requirement that you
> >> couldn't import a module inside your map function? In which case, JS
> >> can't really hash all imported modules until after all possible code
> >> paths have been traced?
> >>
> >> I think a better answer would be to allow commonjs modules, but only
> >> in some name-space of the design document. (IIRC, the other functions
> >> can pull from anywhere, but that would make all design doc updates
> >> trigger view regeneration) Then Erlang just loads this namespace and
> >> anything that could be imported is included in the hash some how (hash
> >> of sorted hashes or some such).
> >>
> >
> > This is an interesting idea and I think I like it more than my original
> > proposal. My fear with the original proposal was that it might be opaque
> to
> > most users what will invalidate their views if we start doing fancy
> > invalidation on modules they use. If we re-scope or restrict the module
> > support to an attribute that would make it very clear that changes to
> those
> > modules will invalidate the view.
> >
> >
> >>
> >> Batching docs across the I/O might not give you as much of a
> >> performance improvement as you'd think. There's a pretty nasty time
> >> explosion on parsing larger JSON documents in some of the various
> >> parsers I've tried. I've noticed this on various Pure erlang parsers,
> >> but I wouldn't be suprised if the the json.js suffered as well. And in
> >> this, I mean, that parsing a one megabyte document might be quite a
> >> bit slower than parsing many smaller documents. So simply wrapping
> >> things in an array could be bad.
> >>
> >
> > The new native C parser in JavaScript is fine with anything this size and
> I
> > believe Damien just wrote an evented JSON parser which should make this
> more
> > acceptable on the client side. One good idea I think jchris has was
> instead
> > of having a number of documents threshold was to have a byte length
> > restriction on the batch we send to the view server.
>
> Yeah, the new embedded JSON parser should be fine as long as we can
> motivate people to upgrade to a recent JavaScript library. My
> experience is more related to the Erlang side as that what I've done
> all of my comparisons against. I haven't done any testing on the
> streaming parser but it'd be interesting to see how it behaves in
> relation to doc size input.
>

Jason's full build tarballs will hopefully help with that but yes, i'm
always telling people to grab a newer SM.


>
> > The I/O time for large amounts of small documents is larger than you
> would
> > expect. I ran some tests a while back and there was more time spent in
> stdio
> > for simple map/reduce operations than there was in processing on the view
> > server.
>
> Did you run the experiment to try batching the updates across the
> wire? I'm not surprised that the transfer can take longer than the
> computation, but I'm not sure how much benefit you'd get from batching
> 100 or so docs. I reckon there'd be some, I just don't have any idea
> how much.
>

I did run the tests. It's not a size issue it's a delay in readline() and
flush calls that over thousands of small documents end up being larger than
the transfer time.

I actually implemented this in the view server and have it in a branch
somewhere on github and it measured well in my test. There just was never
the accompanying erlang work to make it happen.


>
> > Of course the most time spent on view generation is still writing to the
> > btree but that performance has already increased quite a bit so we're
> > looking for other places we can optimize.
> >
> >
> >>
> >> An alternative that I haven't seen anywhere else in this thread was an
> >> idea to tag every message passed to the view engine with a uuid. Then
> >> people can do all sorts of fancy things with the view engine like
> >> async processing and so on and such forth. The downside being that the
> >> saturday afternoon implementation of the view engine in language X now
> >> takes both saturday and sunday afternoon.
> >>
> >
> > So, this gets dicey really fast. I want the external process protocol to
> go
> > non-blocking and support this uuid style communication but I'm really
> > skeptical of it in the view server.
> >
> > The view server should do pure functional transforms, allowing it to do
> I/O
> > means that is no longer true. It's also not just as simple as stamping
> the
> > protocol with a uuid because erlang still needs to load balance any
> number
> > of external processes. When the view server no longer solely blocks on
> > processing it becomes much harder to achieve that load balancing.
> >
>
> Well, the original proposal was that if we do an asynchronous message
> passing thing between with the view server then Erlang doesn't do the
> load balancing, the view server could become threaded or be use a
> pre-fork server model and do the load balancing across multiple cores
> itself.
>
> But you reminded me of the point that convinced me to not experiment
> with the approach. If something causes the view engine to crash you
> can end up affecting a lot of things that are unrelated. Ie, someone
> gets a 500 on a _show function because a different app had a bug in
> its view handling code that happened to be reindexing. With the
> current model the effects of errors are more isolated.
>
> >
> >>
> >> Apologies for missing this thread earlier. Better late than never I
> guess.
> >>
> >> Paul Davis
> >>
> >
>

Re: Proposal for changes in view server/protocol

Reply via email to