Nice work guys! I don't really understand (yet) everything that you're talking about here, but the issue title sounds really great!
Also very glad to hear that ICU was not really a bottleneck for collation. On Tue, Sep 15, 2009 at 3:00 AM, Damien Katz (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Damien Katz closed COUCHDB-495. > ------------------------------- > > Resolution: Fixed > > We now have a raw collation option, and regular json collation is much faster > too. > >> Make views twice as fast >> ------------------------ >> >> Key: COUCHDB-495 >> URL: https://issues.apache.org/jira/browse/COUCHDB-495 >> Project: CouchDB >> Issue Type: Improvement >> Components: JavaScript View Server >> Reporter: Chris Anderson >> Fix For: 0.11 >> >> Attachments: binary_collate.diff, couch_perf.py, less_json.patch, >> numbers-davisp.txt, outputv.patch, perf.py, R13B1-uca-bif.patch, >> term_collate.diff >> >> >> Devs, >> Damien's identified view collation as the most significant bottleneck for >> the view generation. We've done some testing, and some preliminary patches, >> and the upshot seems to be that even removing ICU from the collator is not a >> significant boost. What does speed things up greatly is using raw Erlang >> term comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A >> < B end. provides a roughly 2x speedup. >> However, the patch is challenging for a few reasons: Making the collation >> strategy switchable at all is tough. It's actually quite easy to get an >> alternate less function into the btree writer (all you've got to do is set >> it in couch_view_group:init_group). The hard part is propagating the same >> less function to the PassedEndFun. There's a secondary problem that when you >> use raw term comparison, a lot of terms turn out to come before nil, and >> after {}, which we use as artificial first and last terms in the less_json >> function. So just switching to raw collation alone will leave you with a >> view with unreachable rows. >> I tried two different approaches to the problem last night, and both of them >> led to (instructive) dead ends. I'll attach them for illustration purposes. >> The next line of attack we think should be tried is this: >> First - remove _all_docs_by_seq, as it is just adding complexity to the >> problem, and has been deprecated by _changes anyway. Along the same lines, >> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has >> completely different collation needs than make_view_fold_fun. We'll end up >> duplicating a little code in the _all_docs implementation, but it should be >> worth it because it will make the other work much simpler. >> Once those changes have laid the groundwork, the next step is to change >> make_view_fold_fun and couch_view:fold, so that rather than >> make_view_fold_fun being responsible for detecting when we've passed the >> endkey. That means make_passed_end_fun and all references to PassedEnd and >> PassedEnd fun will be stripped from couch_httpd_view and moved to >> couch_btree. >> couch_view:fold (and the underlying btree) will need to accept not just a >> start, but also an endkey. This will make it much easier to use the less fun >> that is stored on View#view.btree#btree.less to determine PassedEnd funs. >> This will move some complexity to the btree code from the view code, but >> will keep the concerns more aligned. This also means that the btree will >> need to accept not only an endkey for folds, but also an inclusive_end >> parameter. >> Once we have all these refactorings done, it will be easy to make the less >> fun for an index configurable, as both the index writer and the index reader >> will look for it in the same place (on the #btree record). >> My aim is to start a discussion and get someone excited to work on this >> patch. Think of all the fast-views glory you'll get! Please ask questions >> and otherwise force me to clarify the above discussion. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
