[jira] [Commented] (COUCHDB-2236) Weird _users doc conflict when replicating from 1.5.0 -> 1.6.0

Alexander Shorin (JIRA) Fri, 16 May 2014 02:18:28 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993254#comment-13993254
 ]


Alexander Shorin commented on COUCHDB-2236:
-------------------------------------------

This doesn't looks like related to COUCHDB-1780 since 1) hash wasn't actually 
get upgraded 2) there is no way to upgrade it during replication (since you 
need to know raw password string which CouchDB cannot). But to ensure in that, 
I made dummy script that generated some thousands of users with random unicode 
passwords, replicated, random update some of them, replicated again - all user 
docs for local and remote matches each other.

However, I see one problem in your gist. If you run diff between these 
documents you'll see that not only password_sha / salt different:
{code}
$ diff user_doc.json conflicted_user_doc.json
3c3
<   "_rev": "23-05a2fac720acacf8f7a4b44a230d9998",
---
>   "_rev": "23-fb0fe359408394f84ad8b7e91ca40146",
5,6c5,6
<   "salt": "065xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx239",
<   "password_sha": "916xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx3d6",
---
>   "salt": "d15xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0cc",
>   "password_sha": "988xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx275",
10c10
<   "date": "2014-05-02T17:37:44.240Z",
---
>   "date": "2014-04-23T17:43:36.190Z",
66c66
<       "rev": "23-05a2fac720acacf8f7a4b44a230d9998",
---
>       "rev": "23-fb0fe359408394f84ad8b7e91ca40146",
70c70
<       "rev": "22-587d079d14dcde4c2e76e9ca254861d8",
---
>       "rev": "22-f91eae9c93bcab5607bef11691dd9539",
128a129,131
>   ],
>   "_conflicts": [
>     "23-05a2fac720acacf8f7a4b44a230d9998"
130c133
< }
---
> }
{code}

Note that `date` fields are also different and ancestry starts to be different 
since 22-X revision. So I wonder: how different are 22-X revisions for both 
docs and what means the `date` field? As for now, this issue is looks like you 
really introduced conflicts for _users database. Hope new information will 
clarify this moment.

Also, it would be awesome if you can conform that you can reproduce this 
behaviour - this would help a lot to find out the issue.

> Weird _users doc conflict when replicating from 1.5.0 -> 1.6.0
> --------------------------------------------------------------
>
>                 Key: COUCHDB-2236
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2236
>             Project: CouchDB
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>            Reporter: Isaac Z. Schlueter
>
> The upstream write-master for npm is a CouchDB 1.5.0.  (Since it is locked 
> down at the IP level, we're not at risk to the DOS fixed in 1.5.1.)
> All PUT/POST/DELETE requests are routed to this master box, as well as any 
> request with `?write=true` on the URL.  (Used for cases where we still do the 
> PUT/409/GET/PUT dance, rather than using a custom _update function.)
> This master box replicates to a replication hub.  The read slaves all 
> replicate from the replication hub.  Both the /registry and /_users databases 
> replicate continuously using a doc in the /_replicator database.
> As I understand it, since replication only goes in one direction, and all 
> writes to go the upstream master, conflicts should be impossible.
> We brought a 1.6.0 read slave online, version 1.6.0+build.fauxton-91-g5a2864b.
> On this 1.6.0 read slave (and only there), we're seeing /_users doc 
> conflicts, and it looks like it has a different password_sha and salt.  Here 
> is one such example: https://gist.github.com/isaacs/63f332a15109bbfdb8ac  
> (actual passowors_sha and salt mostly redacted, but enough bytes left in so 
> that you can see they're not matching.)
> A few weeks ago, this issue popped up, affecting about 400 user docs, and we 
> figured that it had to do with some instability or human error at the time 
> when that box was set up.  We deleted all of the conflicts, and verified that 
> all docs matched the upstream at that time.  We removed the /_replicator 
> entries, and re-created them using the same script we use to create them on 
> all the other read slaves.
> If this was just one or two docs, or happening across more of the read 
> slaves, I'd be more inclined to think that it has something to do with a 
> particular user, or our particular setup.  However, the /_replicator docs are 
> identical in the 1.6.0 box as on the other read slaves.  This is affecting 
> about 150 users, and only on that one box.
> We've taken the 1.6.0 read slave out of rotation for now, so it's not an 
> urgent issue for us.  If anyone wants to log in and have a look around, I can 
> grant access, but I hope that there's enough information here to track it 
> down.  Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (COUCHDB-2236) Weird _users doc conflict when replicating from 1.5.0 -> 1.6.0

Reply via email to