[
https://issues.apache.org/jira/browse/COUCHDB-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644548#comment-13644548
]
Alexander Shorin commented on COUCHDB-1743:
-------------------------------------------
As an author of Python query server[1], currently I see next problems:
0. Group maps execution, but it's mostly because of CouchDB view index engine.
1. Legacy view API. Since CouchDB-0.11 release there was introduced `ddoc`
command that handles all design functions except map and reduce.
Switching from `add_fun`/`add_lib`/`reduce`/`rereduce` to `ddoc` means that:
1.1 There is no need to use stack of compiled map functions and empty him on
each view index update
1.2 Reduce function may be cached and use `require` function without any
overhead (COUCHDB-1202)
1.3 Since whole cached ddoc on query server side going to be invalidated on
every ddoc update, may be it's matters to implement his partial updates via
JSON Patch.
2. Better logging integration with CouchDB (standalone log file, more logging
levels etc);
3. Or may be more rich configuration (COUCHDB-1143);
4. Features exchange on initial handshake;
5. Non-iterative maps.
5.1 Currently, after sending `map_doc` command CouchDB excepts whole results
for him from view server. This means that after receiving 1MiB of JSON, view
server generates 200MiB of map results and pushes all of them to CouchDB with
single shot. CouchDB parses them into records, build B-tree and makes other
magic, but I feel that a lot of memory overhead may be reduced if view server
will send map results by small chunks as key-value pairs.
5.2 View server is not able to parallel "map" multiple docs. With new mrview
engine this is not true anymore and view server is able to "request" (via
readline() call, hope you're ready to not be blocked by him) `map_doc` for
multiple times, but still it should return results in original order or they
get messed up. This may be changed if view server response will contain
document id to help CouchDB determine for which document he sent these results
are.
6. Missing batch processing. Pushing multiple documents to query server may
speedup maps and validate_doc_updates. However, there CouchDB need to be a bit
smart to use batch sending for small docs, but not for large ones to prevent
OOM problems.
About communication between query server and CouchDB.
JSON support is still key feature of everything that deals with CouchDB. JSON
via StdIO may be slow, but it's too easy to implement by whatever language
today - this may be leaved unattended. In additional, CouchDB may provide
something better, faster and "native" to him. First thing that comes to mind is
"query server as Erlang node". Why not? Many languages already has libraries to
talk with Erlang:
- Python: http://erlport.org/
- Ruby: https://github.com/paukul/intruder
- Java:
http://pdincau.wordpress.com/2010/01/07/how-to-create-a-java-erlang-node-with-jinterface/
- Go: https://github.com/goerlang/node
- PHP: http://code.google.com/p/mypeb/
- NodeJS: https://github.com/rtomayko/node-bertrpc
And I hope others have something similar. So for CouchDB this solution will be
~zero cost while others will only suffer from lack of fast binary terms codec.
I feel most part of these problems may be solved without need to rewrite whole
protocol from scratch. Also, it's a good question about "what problems
completely new protocol aims to solve"? One we'd got: improve overall
communication speed, any others? Thoughts?
> Make the view server & protocol faster
> --------------------------------------
>
> Key: COUCHDB-1743
> URL: https://issues.apache.org/jira/browse/COUCHDB-1743
> Project: CouchDB
> Issue Type: Improvement
> Reporter: Dave Cottlehuber
> Labels: couchdb, erlang, gsoc2013, html, javascript, nodejs, rest
>
> View server protocol enhancements/refactoring - unix sockets, pipelining,
> different wire format etc. Faster!!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira