Re: View Performance (was Re: The 1.0 Thread)

Scott Shumaker Sat, 04 Jul 2009 15:26:57 -0700

Ok - here's some more detailed stats:

Note that this is couch-0.9.0 with hipe enabled and the filter patch,
on my macbook pro.


~53K db documents, ~1500 are type:restaurant

We tested using Brian's bork.rb:

no filtering:

bork.rb - returning no values = 68s
bork.rb - returning 5 values per map(doc) call = 200s
couchjs - returning no values = 93s
couchjs - one doc emitted per type:restaurant = 104s

w/ filtering: (select ~1500 docs out of 53K)

couchjs - returning no values = 8.9s
couchjs - one doc emitted per type:restaurant = 19s


Couple of notes:

53K docs apparently take 68s to be converted to JSON, and received by
the dummy server (with no docs emitted) - or about 780 docs/second.
couchjs is slower than bork.rb in this case (unsurprising -  bork.rb
not really parsing the data)
filtering on the couch side is an enormous win for our test case.

K/V inserts - (5*53K in (200-68)s) = ~2000 per second

This is a pretty big difference from Brian's results (8000/sec),
although we're dealing with many more docs, and without comparing
hardware specs, it's difficult to draw conclusions.

On Sat, Jul 4, 2009 at 11:39 AM, Scott Shumaker<[email protected]> wrote:
> Compiling with HiPE didn't seem to make any difference in performance.  :(
>
> On Thu, Jul 2, 2009 at 4:17 PM, Scott Shumaker<[email protected]> wrote:
>> I'll try that out tomorrow and post the results here.
>>
>> On Thu, Jul 2, 2009 at 3:01 PM, Paul Davis<[email protected]> 
>> wrote:
>>> On Thu, Jul 2, 2009 at 5:50 PM, Scott Shumaker<[email protected]> wrote:
>>>> One question, though: Why are the emitted view results stored as
>>>> erlang terms, as opposed to storing the JSON returned from the view
>>>> server - which is what you'll be serving to the clients anyway?
>>>>
>>>> If you skipped the reverse json->erlang encoding, and additionally
>>>> stored a cached json copy of each document alongside the document
>>>> whenever a document in couchdb was created/updated (which you could
>>>> incrementally generate in a separate erlang process so you don't have
>>>> to slow down write performance) - and just pass this json copy to the
>>>> view, you could basically eliminate the json->erlang conversion
>>>> overhead entirely (since it would only be done asynchronously).
>>>>
>>>> Even if you need to store the emitted view results back into erlang,
>>>> you could have a special optimization case for emitting (key, doc) -
>>>> because you already have the document as both erlang/json (assuming
>>>> you were storing cached json copies).  And include_docs would get
>>>> faster since you wouldn't need to do the json conversion there either.
>>>>
>>>> Just a thought.
>>>>
>>>
>>> Premature optimization is the root of all evil? Have you tried
>>> compiling CouchDB with HiPE enabled. I'm inclined to agree with you
>>> that the large JSON values are probably a significant cause here.
>>> Assuming your Erlang is HiPE enabled you can do something like this to
>>> compile CouchDB:
>>>
>>>    $ ./bootstrap
>>>    $ ERLC_FLAGS="+native +inline +inline_list_funcs" ./configure
>>>    $ make
>>>    $ sudo make install
>>>
>>>
>>>> Scott
>>>>
>>>> On Thu, Jul 2, 2009 at 2:42 PM, Scott Shumaker<[email protected]> wrote:
>>>>> I should mention that we tend to emit (doc._id, doc) in our views - as
>>>>> opposed to doc._id, null and using include_docs - because we found
>>>>> that doc._id,null gave us a 30% speedup on building the views, but
>>>>> cost us about the same on each additional hit to the view.
>>>>>
>>>>> Scott
>>>>>
>>>>> On Thu, Jul 2, 2009 at 2:15 PM, Scott Shumaker<[email protected]> wrote:
>>>>>> We see times that are considerably worse.  We mostly have maps - very
>>>>>> few reduces.  We have 40k objects, about 25 design docs, and 90 views.
>>>>>>  Although we're about to change the code to auto-generate the design
>>>>>> docs based on the view filters used (re: view filter patch) - see if
>>>>>> that helps.
>>>>>>
>>>>>> Maybe it's because we have larger objects - but re-indexing a typical
>>>>>> new view takes > 5 minutes (with view filtering off).  Some are worse.
>>>>>>  With view filtering on some can be quite fast - some views finish in
>>>>>> like 10 seconds.  Interestingly, reindexing all views takes about an
>>>>>> hour - with or without view filtering.  I'm guessing that a
>>>>>> substantial part of the bottleneck is erlang -> json serialization.
>>>>>> Many of our objects are heavily nested structures and exceed 10k in
>>>>>> size.  One other note - when we tried dropping in the optimized
>>>>>> 'main.js' posted on the mailing list, we saw an overall 20% speedup.
>>>>>> Unfortunately, it wasn't compatible with the authentication stuff, and
>>>>>> the deployment was a bit wacky, so we're holding off on that right
>>>>>> now.
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 2, 2009 at 11:30 AM, Damien Katz<[email protected]> wrote:
>>>>>>>
>>>>>>> On Jul 2, 2009, at 1:55 PM, Paul Davis wrote:
>>>>>>>
>>>>>>>> On Thu, Jul 2, 2009 at 1:29 PM, Damien Katz<[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On Jul 2, 2009, at 1:16 PM, Jason Davies wrote:
>>>>>>>>>
>>>>>>>>>> On 2 Jul 2009, at 15:38, Brian Candler wrote:
>>>>>>>>>>
>>>>>>>>>>> For some fruit that was so low-hanging that I nearly stubbed my toe 
>>>>>>>>>>> on
>>>>>>>>>>> it,
>>>>>>>>>>> see https://issues.apache.org/jira/browse/COUCHDB-399
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nice work!  I'd be interested to see what kind of performance 
>>>>>>>>>> increase
>>>>>>>>>> we
>>>>>>>>>> get from Spidermonkey 1.8.1, which comes with native JSON
>>>>>>>>>> parsing/encoding.
>>>>>>>>>>  See here for details:
>>>>>>>>>> https://developer.mozilla.org/En/Using_native_JSON .
>>>>>>>>>>
>>>>>>>>>> Rumour has it 1.8.1 will be released any time soon (TM)
>>>>>>>>>
>>>>>>>>> I'm not sure the new engine is such a no-brainer. One thing about the 
>>>>>>>>> new
>>>>>>>>> generation of JS VMs is we've seen greatly increased memory usage with
>>>>>>>>> earlier versions. Also the startup times might be longer, or shorter.
>>>>>>>>>
>>>>>>>>> Though I wonder if this can be improved by forking a JS process rather
>>>>>>>>> than
>>>>>>>>> spawning a new process.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Memory usage is a definite concern. I'm not sure I follow why startup
>>>>>>>> times would be important though. Am I missing something?
>>>>>>>
>>>>>>> Start up time isn't a huge concern, but it's is a something to 
>>>>>>> consider. On
>>>>>>> a heavily loaded system, scripts that normally work might start to time 
>>>>>>> out,
>>>>>>> requiring restarting the process. Lots of restarts may start to eat 
>>>>>>> lots cpu
>>>>>>> and memory IO.
>>>>>>>
>>>>>>> -Damien
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> -Damien
>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jason Davies
>>>>>>>>>>
>>>>>>>>>> www.jasondavies.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: View Performance (was Re: The 1.0 Thread)

Reply via email to