Make views twice as fast
------------------------

                 Key: COUCHDB-495
                 URL: https://issues.apache.org/jira/browse/COUCHDB-495
             Project: CouchDB
          Issue Type: Improvement
          Components: JavaScript View Server
            Reporter: Chris Anderson
             Fix For: 0.11


Devs,

Damien's identified view collation as the most significant bottleneck for the 
view generation. We've done some testing, and some preliminary patches, and the 
upshot seems to be that even removing ICU from the collator is not a 
significant boost. What does speed things up greatly is using raw Erlang term 
comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A < B 
end. provides a roughly 2x speedup.

However, the patch is challenging for a few reasons: Making the collation 
strategy switchable at all is tough. It's actually quite easy to get an 
alternate less function into the btree writer (all you've got to do is set it 
in couch_view_group:init_group). The hard part is propagating the same less 
function to the PassedEndFun. There's a secondary problem that when you use raw 
term comparison, a lot of terms turn out to come before nil, and after {}, 
which we use as artificial first and last terms in the less_json function. So 
just switching to raw collation alone will leave you with a view with 
unreachable rows.

I tried two different approaches to the problem last night, and both of them 
led to (instructive) dead ends. I'll attach them for illustration purposes.

The next line of attack we think should be tried is this:

First - remove _all_docs_by_seq, as it is just adding complexity to the 
problem, and has been deprecated by _changes anyway. Along the same lines, 
_all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
completely different collation needs than make_view_fold_fun. We'll end up 
duplicating a little code in the _all_docs implementation, but it should be 
worth it because it will make the other work much simpler.

Once those changes have laid the groundwork, the next step is to change 
make_view_fold_fun and couch_view:fold, so that rather than make_view_fold_fun 
being responsible for detecting when we've passed the endkey. That means 
make_passed_end_fun and all references to PassedEnd and PassedEnd fun will be 
stripped from couch_httpd_view and moved to couch_btree.

couch_view:fold (and the underlying btree) will need to accept not just a 
start, but also an endkey. This will make it much easier to use the less fun 
that is stored on View#view.btree#btree.less to determine PassedEnd funs. This 
will move some complexity to the btree code from the view code, but will keep 
the concerns more aligned. This also means that the btree will need to accept 
not only an endkey for folds, but also an inclusive_end parameter.

Once we have all these refactorings done, it will be easy to make the less fun 
for an index configurable, as both the index writer and the index reader will 
look for it in the same place (on the #btree record).

My aim is to start a discussion and get someone excited to work on this patch. 
Think of all the fast-views glory you'll get! Please ask questions and 
otherwise force me to clarify the above discussion.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to