[
https://issues.apache.org/jira/browse/COUCHDB-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152017#comment-13152017
]
Nils Breunese commented on COUCHDB-1343:
----------------------------------------
I discussed this issue in #couchdb on Freenode. Here's a transcript of the
discussion about this issue:
----
<breun> Hi all. I have a production database here and if I POST to
/db/_view_cleanup I get a 500 Internal Server Error like this:
http://friendpaste.com/7YSu5IImVg6HAWzmIrqqG3
[10:32] <breun> That 'reason' doesn't make much sense to me. What's next?
[11:02] <+jan____> breun: lemme see
[11:03] <+jan____> breun: this looks like heavy IO?
[11:12] <breun> jan____: Hm, I only get this error for this one database. There
are other databases on the same server for which I can run view cleanup without
error. I don't think this is the busiest database.
[11:15] <+jan____> breun: technically what happens is that there's a process
inside Erlang that we call to get information about the design doc you want to
compact, that request for information times out when reading the design doc
[11:15] <+jan____> if that makes sense
[11:16] <+jan____> breun: do you have more stacktrace?
[11:16] <breun> jan____: I can request all design docs (it has two) for that
database just fine via Futon.
[11:17] <+jan____> weird
[11:18] <+benoitc> how is the disk usage?
[11:19] <breun> jan____: I don't have more stacktrace, no. /_log shows about
the same info: http://friendpaste.com/1BmLsiY4y4VgENlmxtMref
[11:19] <breun> benoitc: I don't know, I'm not sure I can access that
information in this production environment. :S Let me check.
[11:22] <+jan____> breun: ok, I see where the timeout happens, but I don't
quite know it can happen
[11:22] <+jan____> breun: is there anything else running on that design doc,
compaction, a long view build, anything?
[11:24] <breun> jan____: According to the status page there is nothing running.
I can reproduce this timeout every time. It's about 5 seconds or so, I think?
[11:24] <+jan____> benoitc: the timeout is when sending a msg to the
couch_view_group gen_server, not sure if that is disk bound
[11:25] <+jan____> breun: yes, 5 seconds is the timeout.
[11:26] <+benoitc> jan____: yes, id din't read the code yet, but i supposed it
happened when passing results to that
[11:26] <+benoitc> but well i'm not familiar at all about the view cleanup code
[11:27] <+jan____> benoitc: I'm looking at the code, the only IO that
handle_call for get_group_infor does is couch_file:size()
[11:27] <+jan____> not saying it isn't significant, but seems weird.
[11:28] <+benoitc> yup
[11:29] <+jan____> breun: I'm trying to find out what the cause for this is and
what you can do with this now. do you have the option to restart the couch?
[11:31] <breun> jan____: Restarting CouchDB has been the solution to all
problems I ever brought to this channel. I thought CouchDB wasn't built on
Windows? :)
[11:32] <+jan____> breun: lol :)
[11:32] <breun> jan____: Let me see if I can restart it. But then we might
never find out what's wrong here, right?
[11:32] <+jan____> breun: you could log into the erlang instance and just kill
the view server pid, but you said you don't have much access
[11:32] <+jan____> breun: let's record this instance in an issue
[11:33] <breun> jan____: I definitely can't log into the erlang instance. I
don't have shell access to the server it's running on.
[11:33] <+jan____> the module in question isn't too big, I figure a review
would possible find a race condition or somesuch
[11:34] <+jan____> either way though this calls for better instrumentation and
more fine grained controll over components running inside couch
[11:37] <breun> jan____: I'll request a restart and see if that helps. And I'll
create a ticket for the issue. Thanks for looking.
[11:39] <+jan____> breun: no probs, this really shouldn't happen
[11:40] <+jan____> or if it does, we should have better ways to rectify the
situation
----
> Starting view cleanup fails with a timeout
> ------------------------------------------
>
> Key: COUCHDB-1343
> URL: https://issues.apache.org/jira/browse/COUCHDB-1343
> Project: CouchDB
> Issue Type: Bug
> Affects Versions: 1.0.2
> Environment: Linux
> Reporter: Nils Breunese
> Priority: Minor
>
> Our CouchDB maintenance script (daily compaction, view cleanup, etc.)
> recently started reporting the following error every day:
> ----
> Error cleaning up views of database 'mashup' for the CouchDB instance at
> http://hostname:8080
> ----
> When trying to start view cleanup for this particular database (there are
> more databases in this CouchDB instance) I get the following in the log:
> ----
> [Thu, 17 Nov 2011 09:28:23 GMT] [error] [<0.6547.171>] Uncaught error in HTTP
> request: {exit,
> {timeout,
> {gen_server,call,
> [<0.19070.94>,request_group_info]}}}
> ----
> And the following HTTP 500 response:
> ----
> HTTP/1.1 500 Internal Server Error
> Content-Length: 83
> Server: CouchDB/1.0.2 (Erlang OTP/R13B)
> Date: Thu, 17 Nov 2011 09:28:23 GMT
> Content-Type: text/plain;charset=utf-8
> Cache-Control: must-revalidate
> {"error":"timeout","reason":"{gen_server,call,[<0.19070.94>,request_group_info]}"}
> ----
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira