On Thu, Jul 2, 2009 at 5:50 PM, Scott Shumaker<[email protected]> wrote:
> One question, though: Why are the emitted view results stored as
> erlang terms, as opposed to storing the JSON returned from the view
> server - which is what you'll be serving to the clients anyway?
>
> If you skipped the reverse json->erlang encoding, and additionally
> stored a cached json copy of each document alongside the document
> whenever a document in couchdb was created/updated (which you could
> incrementally generate in a separate erlang process so you don't have
> to slow down write performance) - and just pass this json copy to the
> view, you could basically eliminate the json->erlang conversion
> overhead entirely (since it would only be done asynchronously).
>
> Even if you need to store the emitted view results back into erlang,
> you could have a special optimization case for emitting (key, doc) -
> because you already have the document as both erlang/json (assuming
> you were storing cached json copies). And include_docs would get
> faster since you wouldn't need to do the json conversion there either.
>
> Just a thought.
>
Premature optimization is the root of all evil? Have you tried
compiling CouchDB with HiPE enabled. I'm inclined to agree with you
that the large JSON values are probably a significant cause here.
Assuming your Erlang is HiPE enabled you can do something like this to
compile CouchDB:
$ ./bootstrap
$ ERLC_FLAGS="+native +inline +inline_list_funcs" ./configure
$ make
$ sudo make install
> Scott
>
> On Thu, Jul 2, 2009 at 2:42 PM, Scott Shumaker<[email protected]> wrote:
>> I should mention that we tend to emit (doc._id, doc) in our views - as
>> opposed to doc._id, null and using include_docs - because we found
>> that doc._id,null gave us a 30% speedup on building the views, but
>> cost us about the same on each additional hit to the view.
>>
>> Scott
>>
>> On Thu, Jul 2, 2009 at 2:15 PM, Scott Shumaker<[email protected]> wrote:
>>> We see times that are considerably worse. We mostly have maps - very
>>> few reduces. We have 40k objects, about 25 design docs, and 90 views.
>>> Although we're about to change the code to auto-generate the design
>>> docs based on the view filters used (re: view filter patch) - see if
>>> that helps.
>>>
>>> Maybe it's because we have larger objects - but re-indexing a typical
>>> new view takes > 5 minutes (with view filtering off). Some are worse.
>>> With view filtering on some can be quite fast - some views finish in
>>> like 10 seconds. Interestingly, reindexing all views takes about an
>>> hour - with or without view filtering. I'm guessing that a
>>> substantial part of the bottleneck is erlang -> json serialization.
>>> Many of our objects are heavily nested structures and exceed 10k in
>>> size. One other note - when we tried dropping in the optimized
>>> 'main.js' posted on the mailing list, we saw an overall 20% speedup.
>>> Unfortunately, it wasn't compatible with the authentication stuff, and
>>> the deployment was a bit wacky, so we're holding off on that right
>>> now.
>>>
>>>
>>> On Thu, Jul 2, 2009 at 11:30 AM, Damien Katz<[email protected]> wrote:
>>>>
>>>> On Jul 2, 2009, at 1:55 PM, Paul Davis wrote:
>>>>
>>>>> On Thu, Jul 2, 2009 at 1:29 PM, Damien Katz<[email protected]> wrote:
>>>>>>
>>>>>> On Jul 2, 2009, at 1:16 PM, Jason Davies wrote:
>>>>>>
>>>>>>> On 2 Jul 2009, at 15:38, Brian Candler wrote:
>>>>>>>
>>>>>>>> For some fruit that was so low-hanging that I nearly stubbed my toe on
>>>>>>>> it,
>>>>>>>> see https://issues.apache.org/jira/browse/COUCHDB-399
>>>>>>>
>>>>>>>
>>>>>>> Nice work! I'd be interested to see what kind of performance increase
>>>>>>> we
>>>>>>> get from Spidermonkey 1.8.1, which comes with native JSON
>>>>>>> parsing/encoding.
>>>>>>> See here for details:
>>>>>>> https://developer.mozilla.org/En/Using_native_JSON .
>>>>>>>
>>>>>>> Rumour has it 1.8.1 will be released any time soon (TM)
>>>>>>
>>>>>> I'm not sure the new engine is such a no-brainer. One thing about the new
>>>>>> generation of JS VMs is we've seen greatly increased memory usage with
>>>>>> earlier versions. Also the startup times might be longer, or shorter.
>>>>>>
>>>>>> Though I wonder if this can be improved by forking a JS process rather
>>>>>> than
>>>>>> spawning a new process.
>>>>>>
>>>>>
>>>>> Memory usage is a definite concern. I'm not sure I follow why startup
>>>>> times would be important though. Am I missing something?
>>>>
>>>> Start up time isn't a huge concern, but it's is a something to consider. On
>>>> a heavily loaded system, scripts that normally work might start to time
>>>> out,
>>>> requiring restarting the process. Lots of restarts may start to eat lots
>>>> cpu
>>>> and memory IO.
>>>>
>>>> -Damien
>>>>
>>>>
>>>>>
>>>>>> -Damien
>>>>>>
>>>>>>> --
>>>>>>> Jason Davies
>>>>>>>
>>>>>>> www.jasondavies.com
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
>