[jira] [Commented] (COUCHDB-1343) Starting view cleanup fails with a timeout

Nils Breunese (Commented) (JIRA) Thu, 17 Nov 2011 04:18:19 -0800

    [ 
https://issues.apache.org/jira/browse/COUCHDB-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152017#comment-13152017
 ]


Nils Breunese commented on COUCHDB-1343:
----------------------------------------

I discussed this issue in #couchdb on Freenode. Here's a transcript of the 
discussion about this issue:

----
<breun> Hi all. I have a production database here and if I POST to 
/db/_view_cleanup I get a 500 Internal Server Error like this: 
http://friendpaste.com/7YSu5IImVg6HAWzmIrqqG3 
[10:32] <breun> That 'reason' doesn't make much sense to me. What's next? 
[11:02] <+jan____> breun: lemme see 
[11:03] <+jan____> breun: this looks like heavy IO? 
[11:12] <breun> jan____: Hm, I only get this error for this one database. There 
are other databases on the same server for which I can run view cleanup without 
error. I don't think this is the busiest database. 
[11:15] <+jan____> breun: technically what happens is that there's a process 
inside Erlang that we call to get information about the design doc you want to 
compact, that request for information times out when reading the design doc 
[11:15] <+jan____> if that makes sense 
[11:16] <+jan____> breun: do you have more stacktrace? 
[11:16] <breun> jan____: I can request all design docs (it has two) for that 
database just fine via Futon. 
[11:17] <+jan____> weird 
[11:18] <+benoitc> how is the disk usage? 
[11:19] <breun> jan____: I don't have more stacktrace, no. /_log shows about 
the same info: http://friendpaste.com/1BmLsiY4y4VgENlmxtMref 
[11:19] <breun> benoitc: I don't know, I'm not sure I can access that 
information in this production environment. :S Let me check. 
[11:22] <+jan____> breun: ok, I see where the timeout happens, but I don't 
quite know it can happen 
[11:22] <+jan____> breun: is there anything else running on that design doc, 
compaction, a long view build, anything? 
[11:24] <breun> jan____: According to the status page there is nothing running. 
I can reproduce this timeout every time. It's about 5 seconds or so, I think? 
[11:24] <+jan____> benoitc: the timeout is when sending a msg to the 
couch_view_group gen_server, not sure if that is disk bound 
[11:25] <+jan____> breun: yes, 5 seconds is the timeout. 
[11:26] <+benoitc> jan____: yes, id din't read the code yet, but i supposed it 
happened when passing results to that 
[11:26] <+benoitc> but well i'm not familiar at all about the view cleanup code 
[11:27] <+jan____> benoitc: I'm looking at the code, the only IO that 
handle_call for get_group_infor does is couch_file:size() 
[11:27] <+jan____> not saying it isn't significant, but seems weird. 
[11:28] <+benoitc> yup 
[11:29] <+jan____> breun: I'm trying to find out what the cause for this is and 
what you can do with this now. do you have the option to restart the couch? 
[11:31] <breun> jan____: Restarting CouchDB has been the solution to all 
problems I ever brought to this channel. I thought CouchDB wasn't built on 
Windows? :) 
[11:32] <+jan____> breun: lol :) 
[11:32] <breun> jan____: Let me see if I can restart it. But then we might 
never find out what's wrong here, right? 
[11:32] <+jan____> breun: you could log into the erlang instance and just kill 
the view server pid, but you said you don't have much access 
[11:32] <+jan____> breun: let's record this instance in an issue 
[11:33] <breun> jan____: I definitely can't log into the erlang instance. I 
don't have shell access to the server it's running on. 
[11:33] <+jan____> the module in question isn't too big, I figure a review 
would possible find a race condition or somesuch 
[11:34] <+jan____> either way though this calls for better instrumentation and 
more fine grained controll over components running inside couch 
[11:37] <breun> jan____: I'll request a restart and see if that helps. And I'll 
create a ticket for the issue. Thanks for looking. 
[11:39] <+jan____> breun: no probs, this really shouldn't happen 
[11:40] <+jan____> or if it does, we should have better ways to rectify the 
situation 
----
                
> Starting view cleanup fails with a timeout
> ------------------------------------------
>
>                 Key: COUCHDB-1343
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1343
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 1.0.2
>         Environment: Linux
>            Reporter: Nils Breunese
>            Priority: Minor
>
> Our CouchDB maintenance script (daily compaction, view cleanup, etc.) 
> recently started reporting the following error every day:
> ----
> Error cleaning up views of database 'mashup' for the CouchDB instance at 
> http://hostname:8080
> ----
> When trying to start view cleanup for this particular database (there are 
> more databases in this CouchDB instance) I get the following in the log:
> ----
> [Thu, 17 Nov 2011 09:28:23 GMT] [error] [<0.6547.171>] Uncaught error in HTTP 
> request: {exit,
>                                  {timeout,
>                                   {gen_server,call,
>                                    [<0.19070.94>,request_group_info]}}}
> ----
> And the following HTTP 500 response:
> ----
> HTTP/1.1 500 Internal Server Error
> Content-Length: 83
> Server: CouchDB/1.0.2 (Erlang OTP/R13B)
> Date: Thu, 17 Nov 2011 09:28:23 GMT
> Content-Type: text/plain;charset=utf-8
> Cache-Control: must-revalidate
> {"error":"timeout","reason":"{gen_server,call,[<0.19070.94>,request_group_info]}"}
> ----

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COUCHDB-1343) Starting view cleanup fails with a timeout

Reply via email to