[ 
https://issues.apache.org/jira/browse/COUCHDB-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971289#action_12971289
 ] 

Adam Kocoloski commented on COUCHDB-968:
----------------------------------------

I can definitely come up with a case where the path-based merging will fail to 
fully collapse the tree, but it depends on the ordering of the paths.  For 
example, consider a tree with _revs_limit = 2 that has 3 branches which share a 
common trunk that has been stemmed away, like so (o for an available revision, 
x for one that has been stemmed):

    o
    o
    x
    x o o
o o x
    x

Now I'll up the _revs_limit to 10 and replicate in path that fills in all the 
Xs.  If I skip the non-recursive merge I'll be left with 3 completely separate 
branches, when in reality they all share the first two revisions.  When I go to 
do the stemming the results will depend on the order in which the paths are 
merged.  If I sort the paths by how much they're cut off and add the longest 
one first, I'll be in good shape.  Eash subsequent path will attach the 
original one correctly.  If on the other hand I do the longest one last I'll be 
left with two distinct branches after the merge.

If sorting the paths before merging them in the stemmer is all that's required 
this is a simple fix.  In fact, it looks like it already does this.  It doesn't 
go a global sort of all the paths, but it does appear to sort the paths that 
have nothing in common in order of how much they are cut off.  I think that may 
actually be sufficient.

> Duplicated IDs in _all_docs
> ---------------------------
>
>                 Key: COUCHDB-968
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-968
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.10.1, 0.10.2, 0.11.1, 0.11.2, 1.0, 1.0.1, 1.0.2
>         Environment: any
>            Reporter: Sebastian Cohnen
>            Assignee: Adam Kocoloski
>            Priority: Blocker
>             Fix For: 0.11.3, 1.0.2, 1.1
>
>
> We have a database, which is causing serious trouble with compaction and 
> replication (huge memory and cpu usage, often causing couchdb to crash b/c 
> all system memory is exhausted). Yesterday we discovered that db/_all_docs is 
> reporting duplicated IDs (see [1]). Until a few minutes ago we thought that 
> there are only few duplicates but today I took a closer look and I found 10 
> IDs which sum up to a total of 922 duplicates. Some of them have only 1 
> duplicate, others have hundreds.
> Some facts about the database in question:
> * ~13k documents, with 3-5k revs each
> * all duplicated documents are in conflict (with 1 up to 14 conflicts)
> * compaction is run on a daily bases
> * several thousands updates per hour
> * multi-master setup with pull replication from each other
> * delayed_commits=false on all nodes
> * used couchdb versions 1.0.0 and 1.0.x (*)
> Unfortunately the database's contents are confidential and I'm not allowed to 
> publish it.
> [1]: Part of http://localhost:5984/DBNAME/_all_docs
> ...
> {"id":"9997","key":"9997","value":{"rev":"6096-603c68c1fa90ac3f56cf53771337ac9f"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> ...
> [*]
> There were two (old) servers (1.0.0) in production (already having the 
> replication and compaction issues). Then two servers (1.0.x) were added and 
> replication was set up to bring them in sync with the old production servers 
> since the two new servers were meant to replace the old ones (to update 
> node.js application code among other things).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to