Compiling with HiPE didn't seem to make any difference in performance. :(
On Thu, Jul 2, 2009 at 4:17 PM, Scott Shumaker<[email protected]> wrote: > I'll try that out tomorrow and post the results here. > > On Thu, Jul 2, 2009 at 3:01 PM, Paul Davis<[email protected]> wrote: >> On Thu, Jul 2, 2009 at 5:50 PM, Scott Shumaker<[email protected]> wrote: >>> One question, though: Why are the emitted view results stored as >>> erlang terms, as opposed to storing the JSON returned from the view >>> server - which is what you'll be serving to the clients anyway? >>> >>> If you skipped the reverse json->erlang encoding, and additionally >>> stored a cached json copy of each document alongside the document >>> whenever a document in couchdb was created/updated (which you could >>> incrementally generate in a separate erlang process so you don't have >>> to slow down write performance) - and just pass this json copy to the >>> view, you could basically eliminate the json->erlang conversion >>> overhead entirely (since it would only be done asynchronously). >>> >>> Even if you need to store the emitted view results back into erlang, >>> you could have a special optimization case for emitting (key, doc) - >>> because you already have the document as both erlang/json (assuming >>> you were storing cached json copies). And include_docs would get >>> faster since you wouldn't need to do the json conversion there either. >>> >>> Just a thought. >>> >> >> Premature optimization is the root of all evil? Have you tried >> compiling CouchDB with HiPE enabled. I'm inclined to agree with you >> that the large JSON values are probably a significant cause here. >> Assuming your Erlang is HiPE enabled you can do something like this to >> compile CouchDB: >> >> $ ./bootstrap >> $ ERLC_FLAGS="+native +inline +inline_list_funcs" ./configure >> $ make >> $ sudo make install >> >> >>> Scott >>> >>> On Thu, Jul 2, 2009 at 2:42 PM, Scott Shumaker<[email protected]> wrote: >>>> I should mention that we tend to emit (doc._id, doc) in our views - as >>>> opposed to doc._id, null and using include_docs - because we found >>>> that doc._id,null gave us a 30% speedup on building the views, but >>>> cost us about the same on each additional hit to the view. >>>> >>>> Scott >>>> >>>> On Thu, Jul 2, 2009 at 2:15 PM, Scott Shumaker<[email protected]> wrote: >>>>> We see times that are considerably worse. We mostly have maps - very >>>>> few reduces. We have 40k objects, about 25 design docs, and 90 views. >>>>> Although we're about to change the code to auto-generate the design >>>>> docs based on the view filters used (re: view filter patch) - see if >>>>> that helps. >>>>> >>>>> Maybe it's because we have larger objects - but re-indexing a typical >>>>> new view takes > 5 minutes (with view filtering off). Some are worse. >>>>> With view filtering on some can be quite fast - some views finish in >>>>> like 10 seconds. Interestingly, reindexing all views takes about an >>>>> hour - with or without view filtering. I'm guessing that a >>>>> substantial part of the bottleneck is erlang -> json serialization. >>>>> Many of our objects are heavily nested structures and exceed 10k in >>>>> size. One other note - when we tried dropping in the optimized >>>>> 'main.js' posted on the mailing list, we saw an overall 20% speedup. >>>>> Unfortunately, it wasn't compatible with the authentication stuff, and >>>>> the deployment was a bit wacky, so we're holding off on that right >>>>> now. >>>>> >>>>> >>>>> On Thu, Jul 2, 2009 at 11:30 AM, Damien Katz<[email protected]> wrote: >>>>>> >>>>>> On Jul 2, 2009, at 1:55 PM, Paul Davis wrote: >>>>>> >>>>>>> On Thu, Jul 2, 2009 at 1:29 PM, Damien Katz<[email protected]> wrote: >>>>>>>> >>>>>>>> On Jul 2, 2009, at 1:16 PM, Jason Davies wrote: >>>>>>>> >>>>>>>>> On 2 Jul 2009, at 15:38, Brian Candler wrote: >>>>>>>>> >>>>>>>>>> For some fruit that was so low-hanging that I nearly stubbed my toe >>>>>>>>>> on >>>>>>>>>> it, >>>>>>>>>> see https://issues.apache.org/jira/browse/COUCHDB-399 >>>>>>>>> >>>>>>>>> >>>>>>>>> Nice work! I'd be interested to see what kind of performance increase >>>>>>>>> we >>>>>>>>> get from Spidermonkey 1.8.1, which comes with native JSON >>>>>>>>> parsing/encoding. >>>>>>>>> See here for details: >>>>>>>>> https://developer.mozilla.org/En/Using_native_JSON . >>>>>>>>> >>>>>>>>> Rumour has it 1.8.1 will be released any time soon (TM) >>>>>>>> >>>>>>>> I'm not sure the new engine is such a no-brainer. One thing about the >>>>>>>> new >>>>>>>> generation of JS VMs is we've seen greatly increased memory usage with >>>>>>>> earlier versions. Also the startup times might be longer, or shorter. >>>>>>>> >>>>>>>> Though I wonder if this can be improved by forking a JS process rather >>>>>>>> than >>>>>>>> spawning a new process. >>>>>>>> >>>>>>> >>>>>>> Memory usage is a definite concern. I'm not sure I follow why startup >>>>>>> times would be important though. Am I missing something? >>>>>> >>>>>> Start up time isn't a huge concern, but it's is a something to consider. >>>>>> On >>>>>> a heavily loaded system, scripts that normally work might start to time >>>>>> out, >>>>>> requiring restarting the process. Lots of restarts may start to eat lots >>>>>> cpu >>>>>> and memory IO. >>>>>> >>>>>> -Damien >>>>>> >>>>>> >>>>>>> >>>>>>>> -Damien >>>>>>>> >>>>>>>>> -- >>>>>>>>> Jason Davies >>>>>>>>> >>>>>>>>> www.jasondavies.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
