With this patch applied, in ~99% of cases, yes. Best, Adam
On Dec 8, 2010, at 10:54 AM, Sebastian Cohnen wrote: > do I read this correctly and two normal compaction runs will take care of > dupes in both, _all_docs and _changes? > > On 08.12.2010, at 16:48, [email protected] wrote: > >> Author: kocolosk >> Date: Wed Dec 8 15:48:52 2010 >> New Revision: 1043461 >> >> URL: http://svn.apache.org/viewvc?rev=1043461&view=rev >> Log: >> Usort the infos during compaction to remove dupes, COUCHDB-968 >> >> This is not a bulletproof solution; it only removes dupes when the >> they appear in the same batch of 1000 updates. However, for dupes >> that show up in _all_docs the probability of that happening is quite >> high. If the dupes are only in _changes a user may need to compact >> twice, once to get the dupes ordered together and a second time to >> remove them. >> >> A more complete solution would be to trigger the compaction in "retry" >> mode, but this is siginificantly slower. >> >> Modified: >> couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl >> >> Modified: couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl >> URL: >> http://svn.apache.org/viewvc/couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl?rev=1043461&r1=1043460&r2=1043461&view=diff >> ============================================================================== >> --- couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl (original) >> +++ couchdb/branches/1.1.x/src/couchdb/couch_db_updater.erl Wed Dec 8 >> 15:48:52 2010 >> @@ -775,7 +775,10 @@ copy_rev_tree_attachments(SrcDb, DestFd, >> end, Tree). >> >> >> -copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq, Retry) -> >> +copy_docs(Db, #db{fd=DestFd}=NewDb, InfoBySeq0, Retry) -> >> + % COUCHDB-968, make sure we prune duplicates during compaction >> + InfoBySeq = lists:usort(fun(#doc_info{id=A}, #doc_info{id=B}) -> A =< B >> end, >> + InfoBySeq0), >> Ids = [Id || #doc_info{id=Id} <- InfoBySeq], >> LookupResults = couch_btree:lookup(Db#db.fulldocinfo_by_id_btree, Ids), >> >> >> >
