Alexey Loshkarev created COUCHDB-2329:
-----------------------------------------
Summary: Log broken file name on compress/decompress error
Key: COUCHDB-2329
URL: https://issues.apache.org/jira/browse/COUCHDB-2329
Project: CouchDB
Issue Type: Improvement
Security Level: public (Regular issues)
Components: Database Core, Logging
Reporter: Alexey Loshkarev
Hello.
I'm using couchdb for a bit large database set - over 50 databases with more
than 500 million documents in it with total disk size about 2 TB. I'm using
cluster with 4 nodes for it.
As it is real life, there are hardware errors from time to time. Most of all
didn't affect couchdb, but some of them are. So couchdb write wrong data to
disk, or read garbage from them due to disk read errors.
The bad thing is that couchdb dies at the moment it can't decompress data.
The worst thins is that couchdb didn't log broken file name, to help me with
this problem. If couchdb would display me broken file name, i'll kill it and
recreate via replication from healthy node.
The ugly thing is, I must to drop whole node and re-replicate it. But in my
situation, 2 TB replicates over a month! So, average state of my cluster is - 3
nodes are up, and fourth - replicating terabytes of data.
So, my proposal is to add file name, when couchdb fail to decompress data.
Sample message:
[Mon, 15 Sep 2014 11:51:17 GMT] [error] [emulator] Error in process <0.24789.1>
with exit value: {function_clause,[{couch_compress,decompress,[<<1952804468
bytes>>],[{file,"couch_compress.erl"},{line,67}]},{couch_file,pread_term,2,[{file,"couch_file.erl"},{line,135}]},{couch_btree,get_node,2,[{file,"couch_btree.erl"},{line,349}]},{couch_btree,modify_node...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)