On Tue, Nov 15, 2011 at 01:43, Robert Newson <[email protected]> wrote: > _rev values used to be UUID's and became deterministic to improve > replication performance. I can see that there's a theoretical issue > where replication could be inhibited, though I question how practical > it is given the internal details of _rev calculation. > > Remember that the _rev value is derived from the contents of the > documents, all the bytes of all attachments and values from previous > revisions. Stock MD5 preimage attacks are of of much simpler form > (finding a Y such that MD5(Y)=X for some desired X). Also that you > would have to arrange for the same number of updates as well, since > the number at the front is incremented on each successful update. >
Also remember that the contents would have to parse as JSON, so that restricts this search space even further. Then, if I understand Jason correctly, we're also talking about a situation where Couch B is insecure... it's allowing a malicious user to change documents. If these documents are anything more important than something affecting the user herself then what you have is a malicious administrator or an insecure deployment. I don't think MD5 is to blame here. Does that sound like a reasonable assessment to you, Alex? Also, I'd love to hear about your C++ replicator as it develops. -Randall > For switching from MD5 to SHA-1, I say no. If we switch, let's use > something contemporary like SHA-256. Better yet, let's wait for the > winner of the SHA-3 competition. > > B. > > On 15 November 2011 07:57, Jason Smith <[email protected]> wrote: >> On Tue, Nov 15, 2011 at 7:34 AM, Alex Besogonov >> <[email protected]> wrote: >>>>> Now I make a change to 'Doc' at machine A. This creates a new revid >>>>> with new md5 hash. >>>>> A malicious software somehow learns about this update and creates >>>>> another document >>>>> on machine B, contriving it so to make the resulting hash to be the >>>>> same as on machine A. >>>> Before going any further, you must show why we care about the contents >>>> of machine B. >>>> Why would I log in to machine B if I do not trust B's owner? Why would >>>> I clone your Git repository if I do not know you? >>> The problem is, MD5 hash depends on _untrusted_ data that external >>> processes might put into the database. >>> >>> For example, imagine that machines A and B use CouchDB to store >>> certificates. >> >> I ask again. >> >> -- >> Iris Couch >> >
