[
https://issues.apache.org/jira/browse/COUCHDB-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965114#action_12965114
]
Paul Joseph Davis commented on COUCHDB-968:
-------------------------------------------
Sorry for the delay, ended up having a flight cancelled and got rerouted and
ended up not making it home till just now.
I'm not sure I quite follow what you mean by uncompacted here. I would expect
post compaction when we see the issue in _all_docs that they all have the same
update_seq. Pre compaction in _changes I would expect the same _revision (I
think, just guessing) because it's just iterating the by_seqid_btree and then
displaying the update_seq from the actual #full_doc_info (I think, just
guessing).
As Bob Dionne noted in #couchdb, its not entirely clear where the actual bug
is. Right now its a combination of three things basically: couch_key_tree:stem
kinda sorta fails when merging two revision lists that exceed the rev_limit
setting. Once that fails, we hit another issue that results in two entries in
the by_seqid_btree, and then finally, compaction copies multiple docs to the
actual by_docid_btree.
After musing on it during the copious amounts of queueing I managed to
accomplish today, I think that we should treat them as three bugs right now. My
proposed fixes are basically such:
1. Fix couch_key_tree:stem so that it takes into account when the input write
has a suffix that is a prefix of an existing edit path. This would avoid the
rewrite that fixes everything.
2. We need to figure out a way to fix the breakage of the update_seq. Its a bit
nebulous on whether this is an actual bug as the soution to #1 would fix all
known occurences of this. I think the proper fix would be revisit
couch_db_updater:merge_rev_trees and figure out a better way of picking the new
update_seq (which would basically need to detect if an edit leaf was changed
and only if so, update the update_seq.
3. Our btree implementation should probably check harder for the possibility of
adding duplicate keys. The basic bug is that its a possibility in a single call
to query_modify. A simple solution that I've implemented (that would impact all
calls to query_modify) would be to check the input list of actions for
duplicates. Ie, just iterate over the Actions list and find duplicate {Action,
Key, _Value} tuples. (Ie, ignore differing values). Alternatively, a check deep
down in modify_kvnode could discard Action/Key pairs that are greater than the
last entry in ResultNode there by selecting one of the actions semi randomly
(or alternatively, throw an error when not). I think technically, both are O(N)
with N the size of the list of Actions that were requested.
That is all. I'll look more tomorrow. Right now its time for beer and a bit of
zoning out in front of the tele before I pass out.
> Duplicated IDs in _all_docs
> ---------------------------
>
> Key: COUCHDB-968
> URL: https://issues.apache.org/jira/browse/COUCHDB-968
> Project: CouchDB
> Issue Type: Bug
> Components: Database Core
> Affects Versions: 0.10.1, 0.10.2, 0.11.1, 0.11.2, 1.0, 1.0.1, 1.0.2
> Environment: Ubuntu 10.04.
> Reporter: Sebastian Cohnen
> Priority: Blocker
>
> We have a database, which is causing serious trouble with compaction and
> replication (huge memory and cpu usage, often causing couchdb to crash b/c
> all system memory is exhausted). Yesterday we discovered that db/_all_docs is
> reporting duplicated IDs (see [1]). Until a few minutes ago we thought that
> there are only few duplicates but today I took a closer look and I found 10
> IDs which sum up to a total of 922 duplicates. Some of them have only 1
> duplicate, others have hundreds.
> Some facts about the database in question:
> * ~13k documents, with 3-5k revs each
> * all duplicated documents are in conflict (with 1 up to 14 conflicts)
> * compaction is run on a daily bases
> * several thousands updates per hour
> * multi-master setup with pull replication from each other
> * delayed_commits=false on all nodes
> * used couchdb versions 1.0.0 and 1.0.x (*)
> Unfortunately the database's contents are confidential and I'm not allowed to
> publish it.
> [1]: Part of http://localhost:5984/DBNAME/_all_docs
> ...
> {"id":"9997","key":"9997","value":{"rev":"6096-603c68c1fa90ac3f56cf53771337ac9f"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> {"id":"9999","key":"9999","value":{"rev":"6097-3c873ccf6875ff3c4e2c6fa264c6a180"}},
> ...
> [*]
> There were two (old) servers (1.0.0) in production (already having the
> replication and compaction issues). Then two servers (1.0.x) were added and
> replication was set up to bring them in sync with the old production servers
> since the two new servers were meant to replace the old ones (to update
> node.js application code among other things).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.