[jira] [Commented] (COUCHDB-1743) Make the view server & protocol faster

Alexander Shorin (JIRA) Mon, 29 Apr 2013 07:54:17 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644548#comment-13644548
 ]


Alexander Shorin commented on COUCHDB-1743:
-------------------------------------------

As an author of Python query server[1], currently I see next problems:

0. Group maps execution, but it's mostly because of CouchDB view index engine.

1. Legacy view API. Since CouchDB-0.11 release there was introduced `ddoc` 
command that handles all design functions except map and reduce. 

Switching from `add_fun`/`add_lib`/`reduce`/`rereduce`  to `ddoc` means that:

1.1 There is no need to use stack of compiled map functions and empty him on 
each view index update
1.2 Reduce function may be cached and use `require` function without any 
overhead (COUCHDB-1202)
1.3 Since whole cached ddoc on query server side going to be invalidated on 
every ddoc update, may be it's matters to implement his partial updates via 
JSON Patch. 

2. Better logging integration with CouchDB (standalone log file, more logging 
levels etc);
3. Or may be more rich configuration (COUCHDB-1143);
4. Features exchange on initial handshake;
5. Non-iterative maps.

5.1 Currently, after sending `map_doc` command CouchDB excepts whole results 
for him from view server. This means that after receiving 1MiB of JSON, view 
server generates 200MiB of map results and pushes all of them to CouchDB with 
single shot. CouchDB parses them into records, build B-tree and makes other 
magic, but I feel that a lot of memory overhead may be reduced if view server 
will send map results by small chunks as key-value pairs. 

5.2 View server is not able to parallel "map" multiple docs. With new mrview 
engine this is not true anymore and view server is able to "request" (via 
readline() call, hope you're ready to not be blocked by him) `map_doc` for 
multiple times, but still it should return results in original order or they 
get messed up. This may be changed if view server response will contain 
document id to help CouchDB determine for which document he sent these results 
are.

6. Missing batch processing. Pushing multiple documents to query server may 
speedup maps and validate_doc_updates. However, there CouchDB need to be a bit 
smart to use batch sending for small docs, but not for large ones to prevent 
OOM problems.

About communication between query server and CouchDB.

JSON support is still key feature of everything that deals with CouchDB. JSON 
via StdIO may be slow, but it's too easy to implement by whatever language 
today - this may be leaved unattended. In additional, CouchDB may provide 
something better, faster and "native" to him. First thing that comes to mind is 
"query server as Erlang node". Why not? Many languages already has libraries to 
talk with Erlang:
- Python: http://erlport.org/
- Ruby: https://github.com/paukul/intruder
- Java: 
http://pdincau.wordpress.com/2010/01/07/how-to-create-a-java-erlang-node-with-jinterface/
- Go: https://github.com/goerlang/node
- PHP: http://code.google.com/p/mypeb/
- NodeJS: https://github.com/rtomayko/node-bertrpc

And I hope others have something similar. So for CouchDB this solution will be 
~zero cost while others will only suffer from lack of fast binary terms codec. 

I feel most part of these problems may be solved without need to rewrite whole 
protocol from scratch. Also, it's a good question about "what problems 
completely new protocol aims to solve"? One we'd got: improve overall 
communication speed, any others? Thoughts?
                
> Make the view server & protocol faster
> --------------------------------------
>
>                 Key: COUCHDB-1743
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1743
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Dave Cottlehuber
>              Labels: couchdb, erlang, gsoc2013, html, javascript, nodejs, rest
>
> View server protocol enhancements/refactoring - unix sockets, pipelining, 
> different wire format etc. Faster!!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COUCHDB-1743) Make the view server & protocol faster

Reply via email to