mikerhodes opened a new pull request, #4394: URL: https://github.com/apache/couchdb/pull/4394
## Overview This PR aims to improve Mango by reducing the data transferred to the coordinator during query execution. It may reduce memory or CPU use at the coordinator but that isn't the primary goal. Currently, when documents are read at the shard level, they are compared locally at the shard with the selector to ensure they match before they are sent to the coordinator. This ensures we're not sending documents across the network that the coordinator immediately discards, saving bandwidth and coordinator processing. This PR further executes field projection (`fields` in the query) at the shard level. This should further save bandwidth, particularly for queries that project few fields from large documents. One item of complexity is that a query may request a quorum read of documents, meaning that we need to do the document read at the coordinator and not the shard, then perform the `selector` and `fields` processing there rather than at the shard. To ensure that documents are processed consistently whether at the shard or coordinator, match_and_extract_doc/3 is added. There is still one orphan call outside match_and_extract_doc/2 to extract/2 which supports cluster upgrade and should later be removed. Shard level processing is already performed in a callback, view_cb/2, that's passed to fabric's view processing to run for each row in the view result set. It's used for the shard local selector and fields processing. To make it clear what arguments are destined for this callback, the PR encapsulates the arguments, using viewcbargs_new/2 and viewcbargs_get/2. As we push down more functionality to the shard, the context this function needs to carry with it will increase, so having a record for it will be valuable. Supporting cluster upgrades: The PR supports shard pushdown for Mango `fields` processing for situations during rolling cluster upgrades. (Cloudant require this as they use rolling upgrades). In the state where the coordinator is speaking to an upgraded node, the view_cb/2 needs to support being passed just the `selector` outside of the new viewcbargs record. In this case, the shard will not process fields, but the coordinator will. In the situation where the coordinator is upgraded but the shard is not, we need to send the selector to the shard via `selector` and also execute the fields projection at the coordinator. Therefore we pass arguments to view_cb/2 via both `selector` and `callback_args` and have an apparently spurious field projection (mango_fields:extract/2) in the code that receives back values from the shard ( factored out into doc_member_and_extract). Both of these affordances should only need to exist through one minor version change and be removed thereafter -- if people are jumping several minor versions of CouchDB in one go, hopefully they are prepared for a bit of trouble. Testing upgrade states: As view_cb is completely separate from the rest of the cursor code, we can first try out the branch's code using view_cb from `main`, and then the other way -- the branch's view_cb with the rest of the file from main. I did both of these tests successfully. ## Testing recommendations This PR should not change anything from an end user perspective. Mango responses should remain the same as they currently are. I have run some basic performance locally tests using k6.io, which showed no meaningful change in the latency of requests. ## Related Issues or Pull Requests none. ## Checklist - [x] Code is written and works correctly - [x] Changes are covered by tests - [ ] Any new configurable parameters are documented in `rel/overlay/etc/default.ini` - [ ] Documentation changes were made in the `src/docs` folder - [ ] Documentation changes were backported (separated PR) to affected branches -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
