>> Now I make a change to 'Doc' at machine A. This creates a new revid >> with new md5 hash. >> A malicious software somehow learns about this update and creates >> another document >> on machine B, contriving it so to make the resulting hash to be the >> same as on machine A. > Before going any further, you must show why we care about the contents > of machine B. > Why would I log in to machine B if I do not trust B's owner? Why would > I clone your Git repository if I do not know you? The problem is, MD5 hash depends on _untrusted_ data that external processes might put into the database.
For example, imagine that machines A and B use CouchDB to store certificates. On machine A administrator issues a certificate revocation record for a certificate stored in 'Doc2'. On machine B a malware issues a no-op update of 'Doc2' which is contrived to have the same ID as the certificate revocation record issued on machine A (using normal document management functionality). Such tampering would be normally noticed by presence of conflicts in replication, but in this case it would go unnoticed! This is a somewhat contrived example, but for me the most crucial fact is that external potentially untrusted data can force CouchDB to behave incorrectly and violate its invariants. It can be ignored as a minor issue, of course, but in this case the fix is simple - just a switch from MD5 to something more secure like SHA-1. > Finally, revision tokens might look like MD5, but they are not. They > especially look like MD5 if you read the source code. But they are not > MD5. They are opaque tokens. They do not serve a security function. > Between trusted nodes, they indicate document changes. I'm actually writing a connector between CouchDB and external system, so I'm reimplementing all the functionality required for the synchronization protocol from scratch (in C++). Quite an interesting task.
