Nice work guys!

I don't really understand (yet) everything that you're talking about
here, but the issue title sounds really great!

Also very glad to hear that ICU was not really a bottleneck for collation.



On Tue, Sep 15, 2009 at 3:00 AM, Damien Katz (JIRA) <[email protected]> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/COUCHDB-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> Damien Katz closed COUCHDB-495.
> -------------------------------
>
>    Resolution: Fixed
>
> We now have a raw collation option, and regular json collation is much faster 
> too.
>
>> Make views twice as fast
>> ------------------------
>>
>>                 Key: COUCHDB-495
>>                 URL: https://issues.apache.org/jira/browse/COUCHDB-495
>>             Project: CouchDB
>>          Issue Type: Improvement
>>          Components: JavaScript View Server
>>            Reporter: Chris Anderson
>>             Fix For: 0.11
>>
>>         Attachments: binary_collate.diff, couch_perf.py, less_json.patch, 
>> numbers-davisp.txt, outputv.patch, perf.py, R13B1-uca-bif.patch, 
>> term_collate.diff
>>
>>
>> Devs,
>> Damien's identified view collation as the most significant bottleneck for 
>> the view generation. We've done some testing, and some preliminary patches, 
>> and the upshot seems to be that even removing ICU from the collator is not a 
>> significant boost. What does speed things up greatly is using raw Erlang 
>> term comparison. Eg, instead of using couch_view:less_json, using fun(A,B) A 
>> < B end. provides a roughly 2x speedup.
>> However, the patch is challenging for a few reasons: Making the collation 
>> strategy switchable at all is tough. It's actually quite easy to get an 
>> alternate less function into the btree writer (all you've got to do is set 
>> it in couch_view_group:init_group). The hard part is propagating the same 
>> less function to the PassedEndFun. There's a secondary problem that when you 
>> use raw term comparison, a lot of terms turn out to come before nil, and 
>> after {}, which we use as artificial first and last terms in the less_json 
>> function. So just switching to raw collation alone will leave you with a 
>> view with unreachable rows.
>> I tried two different approaches to the problem last night, and both of them 
>> led to (instructive) dead ends. I'll attach them for illustration purposes.
>> The next line of attack we think should be tried is this:
>> First - remove _all_docs_by_seq, as it is just adding complexity to the 
>> problem, and has been deprecated by _changes anyway. Along the same lines, 
>> _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it has 
>> completely different collation needs than make_view_fold_fun. We'll end up 
>> duplicating a little code in the _all_docs implementation, but it should be 
>> worth it because it will make the other work much simpler.
>> Once those changes have laid the groundwork, the next step is to change 
>> make_view_fold_fun and couch_view:fold, so that rather than 
>> make_view_fold_fun being responsible for detecting when we've passed the 
>> endkey. That means make_passed_end_fun and all references to PassedEnd and 
>> PassedEnd fun will be stripped from couch_httpd_view and moved to 
>> couch_btree.
>> couch_view:fold (and the underlying btree) will need to accept not just a 
>> start, but also an endkey. This will make it much easier to use the less fun 
>> that is stored on View#view.btree#btree.less to determine PassedEnd funs. 
>> This will move some complexity to the btree code from the view code, but 
>> will keep the concerns more aligned. This also means that the btree will 
>> need to accept not only an endkey for folds, but also an inclusive_end 
>> parameter.
>> Once we have all these refactorings done, it will be easy to make the less 
>> fun for an index configurable, as both the index writer and the index reader 
>> will look for it in the same place (on the #btree record).
>> My aim is to start a discussion and get someone excited to work on this 
>> patch. Think of all the fast-views glory you'll get! Please ask questions 
>> and otherwise force me to clarify the above discussion.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Reply via email to