Thanks Randall :) On Nov 17, 2011, at 23:57 , Randall Leeds wrote:
> On Wed, Nov 16, 2011 at 09:46, Alex Besogonov <[email protected]> > wrote: >> On Tue, Nov 15, 2011 at 4:23 PM, Randall Leeds <[email protected]> >> wrote: >>>> Remember that the _rev value is derived from the contents of the >>>> documents, all the bytes of all attachments and values from previous >>>> revisions. Stock MD5 preimage attacks are of of much simpler form >>>> (finding a Y such that MD5(Y)=X for some desired X). Also that you >>>> would have to arrange for the same number of updates as well, since >>>> the number at the front is incremented on each successful update. >>> Also remember that the contents would have to parse as JSON, so that >>> restricts this search space even further. >> Not really. Binary representation of JSON is used to calculate the hash. >> >> So I can make a document like this: >> === >> { >> "aa" : "xxxxxxxxxxxxxx.....[several thousands x's]" >> } >> === >> >> And use the large 'xxx...x' string as a scratch area for my attack. I don't >> even need to bother with quoting issues because CouchDB is going to >> unquote everything during JSON parsing. And there are no other hash >> codes to work around (working around even two MD5s at the same time >> is much harder). >> >> That's about the best possible case for an attacker. > > This "attack", though, is still pretty hard, and, I think, not an > attack. The document _does_ have to take a trip through a JSON parser, > pass as valid JSON, but create an MD5 sum, along with the metadata, > that matches the revision id of the original document. All this needs > to be done on a Couch that is trusted to perform unfiltered, > bi-directional replication and allows the attacker to change documents > that matter to other people. > > The proper way to stop the "attack" is to not let users modify > documents that will screw up things for other people. It's kind of > like how a UNIX user is _welcome_ to trash their .bashrc and just > because their home directory is mounted over NFS and now their .bashrc > is trashed _everywhere_ doesn't mean they've really done any damage > from anyone else's point of view. They didn't attack anything but > themselves. > > ---- > > However. It's worth noting that an attacker can just make up whatever > revision identifiers they want to, without dealing with the MD5 stuff > anyway!!! Passing ?new_edits=false allows an "attacker" to specify > that a document has any revision they want, with whatever history of > revisions they want. > > curl -XPUT -H"Content-Type: application/json" > http://some.couch/somedb/document?new_edits=false > -d'{"_id":"document", "_rev":"5-anything", > "_revisions":{"start":5,ids:["anything", > "everything","bogus","revids"]}}' > > (Side note to devs: we may want to deterministically prune the leaves > for duplicates after merging rev trees, or not, because, well, this is > a crazy hand-crafted fake-out and caveat power-user.) > > In fact, I just discovered yesterday that you can create unreachable > conflicts this way, by giving them revision ids and histories that > create two branches with identical leaves but different stems. If > CouchDB did decide to enforce some crypto-verifiable contraints on > revision ids, they could be checked to prevent this kind of > mis-history. However, other implementations would be forced to follow > the same scheme. I think the intention of making the revision ID > opaque was to make it an implementation detail and specifically _not_ > a security or validation feature. > > That said, I'm starting to come around to this idea. I'd be happy to > see patches that enable a "strict revisions mode" for CouchDB. I don't > feel like CouchDB has made any promises that are broken by using MD5, > but additional promises could possibly be made if we took a git-like > approach to revision crypto. > > I hope that settles the "why", reassures any > "oh-my-god-my-couch-is-vulnerable", and motivates the > "hey-lets-make-a-patch" if you still want the feature, with the > understanding that it's unlikely the project will specify this as a > necessary condition for general-purpose replication. If you have more > bullet-proof needs, dev that armor up and I'll review it, but I'd > advise making it a config option. > > -Randall > >> >>> Then, if I understand Jason >>> correctly, we're also talking about a situation where Couch B is >>> insecure... it's allowing a malicious user to change documents. If >>> these documents are anything more important than something affecting >>> the user herself then what you have is a malicious administrator or an >>> insecure deployment. I don't think MD5 is to blame here. >> No, the issue here is a possibility to break the synchronization. >> >>> Does that sound like a reasonable assessment to you, Alex? >> Almost. >> >>> Also, I'd love to hear about your C++ replicator as it develops. >> Sure, I'm developing a very small and fast embedded storage for mobile >> devices and desktop apps. It'll be open source once I finish its core. >> >>> -Randall >>> >>>> For switching from MD5 to SHA-1, I say no. If we switch, let's use >>>> something contemporary like SHA-256. Better yet, let's wait for the >>>> winner of the SHA-3 competition. >>>> >>>> B. >>>> >>>> On 15 November 2011 07:57, Jason Smith <[email protected]> wrote: >>>>> On Tue, Nov 15, 2011 at 7:34 AM, Alex Besogonov >>>>> <[email protected]> wrote: >>>>>>>> Now I make a change to 'Doc' at machine A. This creates a new revid >>>>>>>> with new md5 hash. >>>>>>>> A malicious software somehow learns about this update and creates >>>>>>>> another document >>>>>>>> on machine B, contriving it so to make the resulting hash to be the >>>>>>>> same as on machine A. >>>>>>> Before going any further, you must show why we care about the contents >>>>>>> of machine B. >>>>>>> Why would I log in to machine B if I do not trust B's owner? Why would >>>>>>> I clone your Git repository if I do not know you? >>>>>> The problem is, MD5 hash depends on _untrusted_ data that external >>>>>> processes might put into the database. >>>>>> >>>>>> For example, imagine that machines A and B use CouchDB to store >>>>>> certificates. >>>>> >>>>> I ask again. >>>>> >>>>> -- >>>>> Iris Couch >>>>> >>>> >>> >>
