Re: Why MD5 is used for hashes, also about non-deterministic IDs.

Jason Smith Tue, 15 Nov 2011 15:55:08 -0800

On Wed, Nov 16, 2011 at 4:23 AM, Randall Leeds <[email protected]> wrote:
> On Tue, Nov 15, 2011 at 01:43, Robert Newson <[email protected]> wrote:
>> _rev values used to be UUID's and became deterministic to improve
>> replication performance. I can see that there's a theoretical issue
>> where replication could be inhibited, though I question how practical
>> it is given the internal details of _rev calculation.
>>
>> Remember that the _rev value is derived from the contents of the
>> documents, all the bytes of all attachments and values from previous
>> revisions. Stock MD5 preimage attacks are of of much simpler form
>> (finding a Y such that MD5(Y)=X for some desired X). Also that you
>> would have to arrange for the same number of updates as well, since
>> the number at the front is incremented on each successful update.
>>
>
> Also remember that the contents would have to parse as JSON, so that
> restricts this search space even further. Then, if I understand Jason
> correctly, we're also talking about a situation where Couch B is
> insecure... it's allowing a malicious user to change documents. If
> these documents are anything more important than something affecting
> the user herself then what you have is a malicious administrator or an
> insecure deployment. I don't think MD5 is to blame here.


That is my understanding. I don't think MD5 is relevant. You could
modify couch B's source code to give you whatever _rev trees you want.
The trick is pushing that back to couch A.

> Does that sound like a reasonable assessment to you, Alex?
>
> Also, I'd love to hear about your C++ replicator as it develops.

Alex, a C++ replicator is super exciting! I don't want to change the
subject, but another replicator implementation would be brilliant!


-- 
Iris Couch

Re: Why MD5 is used for hashes, also about non-deterministic IDs.

Reply via email to