[ 
https://issues.apache.org/jira/browse/COUCHDB-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632387#comment-14632387
 ] 

ASF subversion and git services commented on COUCHDB-2735:
----------------------------------------------------------

Commit 5b1b3e155dd2909db75bed799f40f97c29410b19 in couchdb's branch 
refs/heads/2735-duplicate-docs from [~kocolosk]
[ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=5b1b3e1 ]

Preserve bucket ordering during validation

Document buckets are sorted by docid, but the validation code was
reversing the buckets. If multiple clients send concurrent updates for
the same document the broken sorting can result in duplicate documents.

The particular structure of the patch here is chosen to match the 2.x
codebase.

COUCHDB-2735


> Duplicate document _ids created under high edit load
> ----------------------------------------------------
>
>                 Key: COUCHDB-2735
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2735
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Database Core
>            Reporter: James Dingwall
>            Assignee: Adam Kocoloski
>
> Our database was created under CouchDB 1.2.1 and has been upgraded through 
> 1.3.1 to 1.6.1.  We have been running 1.6.1 since last September.
> We are finding that making a large number of edits to existing documents is 
> causing duplicated document _ids to be created in the _all_docs view:
> # curl -X GET 
> http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\";
> {"total_rows":11670,"offset":10577,"rows":[
> {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}},
> {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"14-984492669d302229de0fff2e1c0e4696"}}
> ]}
> Compacting the database will resolve this.
> # curl -X POST http://admin:password@127.0.0.1:5984/a2/_compact -H 
> "Content-type: application/json" -d '{}'
> # curl -X GET 
> http://127.0.0.1:5984/a2/_all_docs?key=\"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd\";
> {"total_rows":11656,"offset":10564,"rows":[
> {"id":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","key":"vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd","value":{"rev":"49-c2aa999386dbf20e3a88b72cccb678e0"}}
> ]}
> The document is not in conflict at its starting revision and no databases 
> have this database as a target which would cause the problematic document to 
> be written to via replications. i.e. curl -X GET 
> 'http://127.0.0.1:5984/a000prodmaster/vm-84082a94-0f1c-4eff-9216-7ac1e52ce9cd?conflicts=true&deleted_conflicts=true'
>  just returns the document.
> Our edit process consists of a number of view functions and update handlers 
> which are connected by python code to add extra document fields.  We expect 
> that many documents will come up in multiple views so document update 
> conflicts are anticipated and handled in the python code.  Some of the edits 
> are return([modified_doc, response]) others are return([null, modified_doc]) 
> which are collected and submitted as bulk saves (all_or_nothing=false).
> When a document _id is duplicated it appears that that views are calculated 
> using the older revision while modifications are written to the newer 
> revision.
> I am experiencing this regularly while testing an upgrade for a database 
> containing ~12000 documents and which will trigger ~26000 edits.  This 
> upgrade test is on is a separate machine also running CouchDB 1.6.1 and 
> Erlang 18 but the same was observed with 17.5.
> This issue appears similar to COUCHDB-968 but we have never run the versions 
> that this affected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to