[ https://issues.apache.org/jira/browse/COUCHDB-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632390#comment-14632390 ]
ASF GitHub Bot commented on COUCHDB-2735: ----------------------------------------- GitHub user kocolosk opened a pull request: https://github.com/apache/couchdb/pull/327 Ensure PK uniqueness when merging updates from multiple clients We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The PR here fixes a case where we broke the sorting, and adds an additional sort inside the `couch_db_updater` server loop since the consequences of screwing this up are so severe. See [COUCHDB-2735](https://issues.apache.org/jira/browse/COUCHDB-2735). You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/couchdb 2735-duplicate-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb/pull/327.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #327 ---- commit 61d33cb64124535571e61e6ba1b5f353fb297a40 Author: Klaus Trainer <klaus_trai...@apache.org> Date: 2014-10-27T10:55:14Z Improve documentation of `cacert_file` ssl option The documentation was incorrect insofar that it only described its functionality for client verification, although the configuration is used for server verification as well. commit 46f7b4e7e1e34f427b333fd97ddb02e475848607 Author: Robert Newson <rnew...@apache.org> Date: 2015-05-05T14:03:44Z s/afrikan/afrikaans/g commit 95cb436be30305efa091809813b64ef31af968c8 Author: Dave Cottlehuber <d...@apache.org> Date: 2015-06-26T08:31:27Z build: support OTP-18.0 commit 5b1b3e155dd2909db75bed799f40f97c29410b19 Author: Adam Kocoloski <a...@cloudant.com> Date: 2015-07-15T21:06:18Z Preserve bucket ordering during validation Document buckets are sorted by docid, but the validation code was reversing the buckets. If multiple clients send concurrent updates for the same document the broken sorting can result in duplicate documents. The particular structure of the patch here is chosen to match the 2.x codebase. COUCHDB-2735 commit 47d7b05fa63cb77ed7852a5d20f86720e6ac8de1 Author: Adam Kocoloski <a...@cloudant.com> Date: 2015-07-17T23:20:36Z Ensure doc groups are sorted before merging them We had been implicily assuming that clients send us sorted groups, but unsurprisingly that's not always the case. The additional sorting here should be redundant, but the consequences of merging unsorted groups are severe -- we can end up with uniqueness violations on the primary key in the database -- and so we add an additional sort here. COUCHDB-2735 ---- > Duplicate document _ids created under high edit load > ---------------------------------------------------- > > Key: COUCHDB-2735 > URL: https://issues.apache.org/jira/browse/COUCHDB-2735 > Project: CouchDB > Issue Type: Bug > Security Level: public(Regular issues) > Components: Database Core > Reporter: James Dingwall > Assignee: Adam Kocoloski > > Our database was created under CouchDB 1.2.1 and has been upgraded through > 1.3.1 to 1.6.1. We have been running 1.6.1 since last September. > We are finding that making a large number of edits to existing documents is > causing duplicated document _ids to be created in the _all_docs view: > # curl -X GET > http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\" > {"total_rows":11670,"offset":10577,"rows":[ > {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}}, > {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"14-984492669d302229de0fff2e1c0e4696"}} > ]} > Compacting the database will resolve this. > # curl -X POST http://admin:password@127.0.0.1:5984/a2/_compact -H > "Content-type: application/json" -d '{}' > # curl -X GET > http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\" > {"total_rows":11656,"offset":10564,"rows":[ > {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}} > ]} > The document is not in conflict at its starting revision and no databases > have this database as a target which would cause the problematic document to > be written to via replications. i.e. curl -X GET > 'http://127.0.0.1:5984/a000prodmaster/vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd?conflicts=true&deleted_conflicts=true' > just returns the document. > Our edit process consists of a number of view functions and update handlers > which are connected by python code to add extra document fields. We expect > that many documents will come up in multiple views so document update > conflicts are anticipated and handled in the python code. Some of the edits > are return([modified_doc, response]) others are return([null, modified_doc]) > which are collected and submitted as bulk saves (all_or_nothing=false). > When a document _id is duplicated it appears that that views are calculated > using the older revision while modifications are written to the newer > revision. > I am experiencing this regularly while testing an upgrade for a database > containing ~12000 documents and which will trigger ~26000 edits. This > upgrade test is on is a separate machine also running CouchDB 1.6.1 and > Erlang 18 but the same was observed with 17.5. > This issue appears similar to COUCHDB-968 but we have never run the versions > that this affected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)