Hi,
On Oct 24, 2008, at 18:32 , kowsik wrote:
Agreed on the armchair philosophy.
Cool! :) One thing though…
Given the huge potential for couch,
just trying to see how far we can push it from a scalability
perspective.
... nit-picking. Single code path execution speed has little to do
with scaling :)
On a related note, are there thoughts on how best to organize the
design views? Given how the couchjs <=> erlang interaction works, what
I'm finding is that if I have a number of map/reduce tuples in a given
design view, the "map_doc" command returns keys generated by all of
the map functions and sometimes there's 100's of KBytes (sometimes
MBytes) of JSON data exchanged between the two processes. If I split
them up, then each design view (with now just one map/reduce tuple)
runs pretty fast and there's no thrashing. I know it's too early to
talk about "best practices", but what's the "recommended" way of
organizing design views?
Trade-offs.
Consider two views A and B. If they reside in the same design document,
both are updated if you query either. Docs have to be converted to JSON
and back to JS only once for each view. If they live in separate design
documents, each doc has to be converted twice; once for each view.
Grouping views in design documents makes better use of your CPU cycles.
If one view in a group takes particularly long to update, you might
want to
put that into a separate design document to speed up requests to your
other
views.
Design documents are designed (heh) to hold all information (even code)
for a single application. Having all views for an application in a
single design
document follows this philosophy. There are valid cases to break this
rule
though. Note that your application will be not as simple to handle.
Cheers
Jan
--
On Thu, Oct 23, 2008 at 10:15 AM, Jan Lehnardt <[EMAIL PROTECTED]> wrote:
On Oct 23, 2008, at 19:08, kowsik wrote:
No, not explicitly, but I was looking at evalcx which sets up and
tears down a new JSContext each time a function is compiled. Just
wondering if there's an incremental way of doing this so we don't
incur the overhead each time reduce is invoked.
Armchair optimization is futile, sorry. Without the number to back
it up,
it doesn't make any sense wo change this.
Maybe with TraceMonkey, this is a moot point? ;-)
Tracemonkey is not there yet, initial tests showed promising
execution
speeds, but less than stellar memory usage. I expect to see
improvements
in this area, though. It'll be good! :)
Cheers
Jan
--
K.
On Thu, Oct 23, 2008 at 9:42 AM, Jan Lehnardt <[EMAIL PROTECTED]>
wrote:
On Oct 23, 2008, at 18:35, kowsik wrote:
I was seeing the couchjs process getting pegged and I was
looking into
further.
1. The protocol uses "add_fun" command to add all the map
functions to
the view server so that they are compiled and ready to go.
2. Then "map_doc" is invoked with each document to return the
results
of the various emit's
3. They "reduce" and "rereduce" are invoked depending on the
query.
It's #3 that I'm interested in. The "reduce" command currently
passes
in the reduce function string from the design view each time it's
invoked. This means the reduce function is first eval'd and then
executed. I'm wondering if it's possible to rename "add_fun" to
"add_map_fun" and introduce a new "add_reduce_fun" and then
reference
functions by index? Seems like it might speed things up quite a
bit.
Have you measured the reduce function compilation to be a
bottleneck?
I'd guess (see, haven't measured either:-) that erlang-term to
JSON &
JSON
parsing take up significantly more time.
Cheers
Jan
--