[
https://issues.apache.org/jira/browse/COUCHDB-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885121#comment-13885121
]
Igor Klimer commented on COUCHDB-2040:
--------------------------------------
As per Robert's suggestion, I've tried replicating the database, and it handled
the corrupted document very well - the whole process succeeded, it seems. The
original database has 130.2 GB, 1603291 documents, the replicated one 121.9 GB,
1603279 documents (12 less - is it normal that the number of documents
changed?). But there is an error in the logs clearly showing the document
that's been giving me this much trouble:
[Wed, 29 Jan 2014 00:39:13 GMT] [error] [<0.28617.7>] Replicator: couldn't
write document `332720882465`, revision `1-32e947c4533449463d59a9caa8042677`,
to target database `ecrepo2`. Error: `md5_mismatch`.
So, it seems that Benoit's hunch was right, it is an md5_mismatch.
I've checked the offending document - the attachment is a pdf (generated by us)
and it opened once all right, with a small glitch in the text. However,
subsequent requests seem to fail:
[Wed, 29 Jan 2014 07:54:04 GMT] [error] [<0.29843.7>] Uncaught error in HTTP
request: {error,
{badmatch,
<<164,221,186,162,
117,83,31,203,49,
201,48,72,186,166,
2,81>>}}
[Wed, 29 Jan 2014 07:54:04 GMT] [error] [<0.29843.7>] httpd 500 error response:
{"error":"badmatch","reason":"�ݺ�uS\u001F�1�0H��\u0002Q"}
Interestingly, couchdb seems to "hang", meaning it doesn't return any error to
the client, just in the logs.
Of course, this document does not exist in the replicated database.
> Compaction fails when copying attachment
> ----------------------------------------
>
> Key: COUCHDB-2040
> URL: https://issues.apache.org/jira/browse/COUCHDB-2040
> Project: CouchDB
> Issue Type: Bug
> Components: Database Core
> Reporter: Igor Klimer
>
> Orignal discussion from the user mailing list:
> http://mail-archives.apache.org/mod_mbox/couchdb-user/201401.mbox/%3cd14f971a540b974bb75adc55f00f34ca69a35...@sex1.getback.ad2008r2.corp%3e
> Digest:
> During database compaction, the process fails at about 50% with the following
> error: http://pastebin.com/qeaZNHMj (CouchDB 1.2.0, Windows Server 2008 R2
> Enterprise).
> After server and CouchDB upgrade the error is still the same:
> http://pastebin.com/feJWu7bN (CouchDB 1.5.0, Ubuntu 12.04.3 LTS (GNU/Linux
> 3.8.0-33-generic x86_64)).
> There was one prior attempt at compaction that failed because of insufficient
> disk space: http://pastebin.com/S1URXN0p
> After this initial failure, I've made sure that there's sufficient disk space
> for the .compact file.
> The .compact file was always removed before trying compaction again.
> At the request of Robert Samuel Newson, I've also tried with an empty
> .compact file - the results were the same: http://pastebin.com/MJCgGM8C.
> Our I/O subsystem consists of some RAID5 matrices - the admins claim that
> they've been running error-free since inception ;) We have yet to run a
> parity check, since that'd require taking the matrix offline and I'd rather
> not do that without exhausting other options.
> Config files from the 1.2.0/Windows server (since that's where the fault must
> have occured):
> default.ini: http://pastebin.com/kUz0qyNk
> local.ini: http://pastebin.com/srZUMwzB
> Other than the default delayed_commits set to true, there are no options that
> could affect fsync()ing and such.
> I've run:
> curl localhost:5984/ecrepo/_changes?include_docs=true
> curl localhost:5984/ecrepo/_all_docs?include_docs=true
> and both calls succeeded, which would suggest that a faulty (incorrect
> checksum/length) is at fault somewhere.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)