[
https://issues.apache.org/jira/browse/COUCHDB-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143266#comment-14143266
]
Robert Newson commented on COUCHDB-2338:
----------------------------------------
The "completely random" assertion is false in almost all cases, we've had MD5
for attachments for a very long while. The random value is only generated for
databases that predate that. The code you cite shows that the MD5's are mixed
in where available.
Sidenote: we should be generalising all the checksumming anyway but with a view
to removing MD5.
> Reproduceable document revision hash calculation
> ------------------------------------------------
>
> Key: COUCHDB-2338
> URL: https://issues.apache.org/jira/browse/COUCHDB-2338
> Project: CouchDB
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Database Core
> Reporter: Alexander Shorin
>
> Current document revision hash implementation is very Erlang-specific:
> {code}
> new_revid(#doc{body=Body,revs={OldStart,OldRevs},
> atts=Atts,deleted=Deleted}) ->
> case [{N, T, M} || #att{name=N,type=T,md5=M} <- Atts, M =/= <<>>] of
> Atts2 when length(Atts) =/= length(Atts2) ->
> % We must have old style non-md5 attachments
> ?l2b(integer_to_list(couch_util:rand32()));
> Atts2 ->
> OldRev = case OldRevs of [] -> 0; [OldRev0|_] -> OldRev0 end,
> couch_util:md5(term_to_binary([Deleted, OldStart, OldRev, Body,
> Atts2]))
> end.
> {code}
> All the bits in code above are trivial for every programming language except
> {{term_to_binary}} function implementation: to make it right you need dive
> deeper into Erlang. I have nothing against it, Erlang is cool, but this
> implementation specifics makes whole idea to reproduce document revision as
> untrivial complex operation.
> Rationale: you want to build CouchDB compatible storage on different from
> Erlang technology stack that will "sync" with CouchDB without worry about
> non-matched revisions for the same content with the same modification history
> done in different "compatible" storages.
> P.S. Oh, yes, if you updates attachmets (add/del) revision becomes completely
> random. Moreover, if you just updates attachment for document there is some
> specific about revision calculation I don't recall now, but that would be
> easily notice by looking what the specified function takes on call.
> P.P.S. via https://twitter.com/janl/status/514019496110333952
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)