[ 
https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751207#action_12751207
 ] 

Paul Joseph Davis commented on COUCHDB-495:
-------------------------------------------

Good sleuthing Chris. Can you upload the script you were using to test view 
performance so we have a common method for measuring performance improvements?

> Make views twice as fast
> ------------------------
>
>                 Key: COUCHDB-495
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-495
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: JavaScript View Server
>            Reporter: Chris Anderson
>             Fix For: 0.11
>
>         Attachments: binary_collate.diff, term_collate.diff
>
>
> Devs,
> Damien's identified view collation as the most significant bottleneck for the 
> view generation. We've done some testing, and some preliminary patches, and 
> the upshot seems to be that even removing ICU from the collator is not a 
> significant boost. What does speed things up greatly is using raw Erlang term 
> comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
> end. provides a roughly 2x speedup.
> However, the patch is challenging for a few reasons: Making the collation 
> strategy switchable at all is tough. It's actually quite easy to get an 
> alternate less function into the btree writer (all you've got to do is set it 
> in couch_view_group:init_group). The hard part is propagating the same less 
> function to the PassedEndFun. There's a secondary problem that when you use 
> raw term comparison, a lot of terms turn out to come before nil, and after 
> {}, which we use as artificial first and last terms in the less_json 
> function. So just switching to raw collation alone will leave you with a view 
> with unreachable rows.
> I tried two different approaches to the problem last night, and both of them 
> led to (instructive) dead ends. I'll attach them for illustration purposes.
> The next line of attack we think should be tried is this:
> First - remove _all_docs_by_seq, as it is just adding complexity to the 
> problem, and has been deprecated by _changes anyway. Along the same lines, 
> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
> completely different collation needs than make_view_fold_fun. We'll end up 
> duplicating a little code in the _all_docs implementation, but it should be 
> worth it because it will make the other work much simpler.
> Once those changes have laid the groundwork, the next step is to change 
> make_view_fold_fun and couch_view:fold, so that rather than 
> make_view_fold_fun being responsible for detecting when we've passed the 
> endkey. That means make_passed_end_fun and all references to PassedEnd and 
> PassedEnd fun will be stripped from couch_httpd_view and moved to couch_btree.
> couch_view:fold (and the underlying btree) will need to accept not just a 
> start, but also an endkey. This will make it much easier to use the less fun 
> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
> This will move some complexity to the btree code from the view code, but will 
> keep the concerns more aligned. This also means that the btree will need to 
> accept not only an endkey for folds, but also an inclusive_end parameter.
> Once we have all these refactorings done, it will be easy to make the less 
> fun for an index configurable, as both the index writer and the index reader 
> will look for it in the same place (on the #btree record).
> My aim is to start a discussion and get someone excited to work on this 
> patch. Think of all the fast-views glory you'll get! Please ask questions and 
> otherwise force me to clarify the above discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to