[
https://issues.apache.org/jira/browse/COUCHDB-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676584#comment-13676584
]
Eli Stevens commented on COUCHDB-1817:
--------------------------------------
Of course, the next run had an issue. :(
Here's the stack trace from the first server 500 for the data run:
https://gist.github.com/wickedgrey/ef7b59e6b0be7ec47692
Here's the stack trace from trying to *read* the view in question:
https://gist.github.com/wickedgrey/d6cb6e0fa2190882977f
We've copied the DB files off, and are ready to send them to someone who would
like to debug the issue.
> OS Process Error <0.21247.103> :: {os_process_error, {exit_status,0}}
> ---------------------------------------------------------------------
>
> Key: COUCHDB-1817
> URL: https://issues.apache.org/jira/browse/COUCHDB-1817
> Project: CouchDB
> Issue Type: Bug
> Components: JavaScript View Server
> Reporter: Eli Stevens
> Attachments: couchdb__couchdb_files.png,
> couchdb__httpd_status_codes.png, couchdb_mem.png, loadavg.png, memory2.png
>
>
> We have started seeing errors crop up in our application that we have not
> seen before, and we're at a loss for how to start debugging it.
> [~dch] Said that we might look into system resource limits, so we started
> collecting all of the output from _stats into RRD (along with memory, load,
> etc. that we were already collecting), but nothing is jumping out at us as
> obviously problematic.
> We can semi-reliably reproduce the problem, but it's far from a minimal test
> case (basically, we load up several large chunks of data, and then halfway
> through the processing run, we get the error). The error doesn't seem to
> happen if we load up each chunk by itself.
> The DB in question has about 100 docs in it, none particularly large (nothing
> over a couple KB would be my guess), with a couple hundred MB in attachments.
> 10ish design docs, coffeescript. In general, there isn't anything that
> seems obviously resource intensive.
> We have seen this issue on 1.2.0, 1.2.1, and we're working on getting a
> machine with 1.3.0 set up (the PPA we'd been using hasn't been updated yet).
> Ubuntu 12.04, spinning disk, etc. The system is under load when it happens,
> but the load isn't more than 1.5x the number of cores. I don't have disk IO
> numbers at hand, but I'd be surprised if that was being strained.
> Error as it appears in couch.log:
> https://gist.github.com/wickedgrey/e7fd3fc14b6d43e95564
> The design doc in question:
> https://gist.github.com/wickedgrey/db41b0c3c75a590e2109
> An example document: https://gist.github.com/wickedgrey/a8422aab261ddd2ce4fe
> We have some preliminary evidence that the problem persists after the system
> goes quiet, but we're not certain.
> Either CouchDB isn't handling things correctly, in which case this bug is
> "prz fix" or we're doing something wrong (hitting a resource limit, or
> something), in which case this bug is "prz make the error message more
> informative".
> Thanks!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira