[ 
https://issues.apache.org/jira/browse/COUCHDB-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632390#comment-14632390
 ] 

ASF GitHub Bot commented on COUCHDB-2735:
-----------------------------------------

GitHub user kocolosk opened a pull request:

    https://github.com/apache/couchdb/pull/327

    Ensure PK uniqueness when merging updates from multiple clients

    We had been implicily assuming that clients send us sorted groups, but 
unsurprisingly that's not always the case. The PR here fixes a case where we 
broke the sorting, and adds an additional sort inside the `couch_db_updater` 
server loop since the consequences of screwing this up are so severe.
    
    See [COUCHDB-2735](https://issues.apache.org/jira/browse/COUCHDB-2735).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/couchdb 2735-duplicate-docs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/couchdb/pull/327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #327
    
----
commit 61d33cb64124535571e61e6ba1b5f353fb297a40
Author: Klaus Trainer <klaus_trai...@apache.org>
Date:   2014-10-27T10:55:14Z

    Improve documentation of `cacert_file` ssl option
    
    The documentation was incorrect insofar that it only described its
    functionality for client verification, although the configuration is
    used for server verification as well.

commit 46f7b4e7e1e34f427b333fd97ddb02e475848607
Author: Robert Newson <rnew...@apache.org>
Date:   2015-05-05T14:03:44Z

    s/afrikan/afrikaans/g

commit 95cb436be30305efa091809813b64ef31af968c8
Author: Dave Cottlehuber <d...@apache.org>
Date:   2015-06-26T08:31:27Z

    build: support OTP-18.0

commit 5b1b3e155dd2909db75bed799f40f97c29410b19
Author: Adam Kocoloski <a...@cloudant.com>
Date:   2015-07-15T21:06:18Z

    Preserve bucket ordering during validation
    
    Document buckets are sorted by docid, but the validation code was
    reversing the buckets. If multiple clients send concurrent updates for
    the same document the broken sorting can result in duplicate documents.
    
    The particular structure of the patch here is chosen to match the 2.x
    codebase.
    
    COUCHDB-2735

commit 47d7b05fa63cb77ed7852a5d20f86720e6ac8de1
Author: Adam Kocoloski <a...@cloudant.com>
Date:   2015-07-17T23:20:36Z

    Ensure doc groups are sorted before merging them
    
    We had been implicily assuming that clients send us sorted groups, but
    unsurprisingly that's not always the case. The additional sorting here
    should be redundant, but the consequences of merging unsorted groups are
    severe -- we can end up with uniqueness violations on the primary key in
    the database -- and so we add an additional sort here.
    
    COUCHDB-2735

----


> Duplicate document _ids created under high edit load
> ----------------------------------------------------
>
>                 Key: COUCHDB-2735
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2735
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Database Core
>            Reporter: James Dingwall
>            Assignee: Adam Kocoloski
>
> Our database was created under CouchDB 1.2.1 and has been upgraded through 
> 1.3.1 to 1.6.1.  We have been running 1.6.1 since last September.
> We are finding that making a large number of edits to existing documents is 
> causing duplicated document _ids to be created in the _all_docs view:
> # curl -X GET 
> http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\";
> {"total_rows":11670,"offset":10577,"rows":[
> {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}},
> {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"14-984492669d302229de0fff2e1c0e4696"}}
> ]}
> Compacting the database will resolve this.
> # curl -X POST http://admin:password@127.0.0.1:5984/a2/_compact -H 
> "Content-type: application/json" -d '{}'
> # curl -X GET 
> http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\";
> {"total_rows":11656,"offset":10564,"rows":[
> {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}}
> ]}
> The document is not in conflict at its starting revision and no databases 
> have this database as a target which would cause the problematic document to 
> be written to via replications. i.e. curl -X GET 
> 'http://127.0.0.1:5984/a000prodmaster/vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd?conflicts=true&deleted_conflicts=true'
>  just returns the document.
> Our edit process consists of a number of view functions and update handlers 
> which are connected by python code to add extra document fields.  We expect 
> that many documents will come up in multiple views so document update 
> conflicts are anticipated and handled in the python code.  Some of the edits 
> are return([modified_doc, response]) others are return([null, modified_doc]) 
> which are collected and submitted as bulk saves (all_or_nothing=false).
> When a document _id is duplicated it appears that that views are calculated 
> using the older revision while modifications are written to the newer 
> revision.
> I am experiencing this regularly while testing an upgrade for a database 
> containing ~12000 documents and which will trigger ~26000 edits.  This 
> upgrade test is on is a separate machine also running CouchDB 1.6.1 and 
> Erlang 18 but the same was observed with 17.5.
> This issue appears similar to COUCHDB-968 but we have never run the versions 
> that this affected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to