On Mar 18, 2012, at 21:46 , Randall Leeds wrote:
> On Sun, Mar 18, 2012 at 13:39, Jan Lehnardt <[email protected]> wrote:
>
>>
>> On Mar 18, 2012, at 21:28 , Randall Leeds wrote:
>>
>>> On Sun, Mar 18, 2012 at 11:08, Stefan Kögl <[email protected]>
>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Another thing I noticed during my tests of CouchDB 1.2.x. I redirected
>>>> live traffic to the instance and after a rather short time, requests
>>>> were failing with the following information in the logs:
>>>>
>>>>
>>>> [Sun, 18 Mar 2012 16:39:24 GMT] [error] [<0.27554.2>]
>>>> {error_report,<0.31.0>,
>>>> {<0.27554.2>,std_error,
>>>> [{application,mochiweb},
>>>> "Accept failed error",
>>>> "{error,emfile}"]}}
>>>> [Sun, 18 Mar 2012 16:39:24 GMT] [error] [<0.27554.2>]
>>>> {error_report,<0.31.0>,
>>>> {<0.27554.2>,crash_report,
>>>> [[{initial_call,
>>>> {mochiweb_acceptor,init,
>>>> ['Argument__1','Argument__2',
>>>> 'Argument__3']}},
>>>> {pid,<0.27554.2>},
>>>> {registered_name,[]},
>>>> {error_info,
>>>> {exit,
>>>> {error,accept_failed},
>>>> [{mochiweb_acceptor,init,3},
>>>> {proc_lib,init_p_do_apply,3}]}},
>>>> {ancestors,
>>>> [couch_httpd,couch_secondary_services,
>>>> couch_server_sup,<0.32.0>]},
>>>> {messages,[]},
>>>> {links,[<0.129.0>]},
>>>> {dictionary,[]},
>>>> {trap_exit,false},
>>>> {status,running},
>>>> {heap_size,233},
>>>> {stack_size,24},
>>>> {reductions,244}],
>>>> []]}}
>>>>
>>>>
>>>> I think "emfile" means that CouchDB (or mochiweb?) couldn't open any
>>>> more files / connections. I've set the (hard and soft) nofile limit for
>>>> user couchdb to 4096, but didn't raise the ERL_MAX_PORTS accordingly.
>>>> Anyway, as soon as the error occured, CouchDB started writing most of my
>>>> view files from scratch, rendering the instance unusable.
>>>>
>>>> I'd expect CouchDB to fail more gracefully when the maximum number of
>>>> open files is reached. Is this a bug or expected behaviour?
>>>>
>>>
>>> Looks like a bug. Whenever there's a problem opening a view file,
>>> couch_view tries to delete it. Clearly, this is not the right course of
>>> action when the problem is due to emfile.
>>
>> This looks rather serious. I opened a JIRA:
>>
>> https://issues.apache.org/jira/browse/COUCHDB-1445
>>
>> And started collecting the info. Bob N's message came in in the meantime
>> and I agree, we should see if there's more cases where we need to be
>> careful.
>>
>> Also, I'd consider this blocking for 1.2.0.
>>
>> Anyone who can pitch in with their expertise is more than welcome! :)
>>
>
> Assigned to me. Patch forthcoming. Agree in should block 1.2.0, especially
> because upgrades are the sort of things where bad packaging downstream
> might cause custom ERL_MAX_PORTS settings to be overwritten and we wouldn't
> want anyone's production to have its views erased needlessly.
Thanks for taking this on Randall!
Cheers
Jan
--
>
> -Randall
>
>
>>
>> Cheers
>> Jan
>> --
>>
>>
>>>
>>> Here's a patch that I propose might fix it. I'd like to hear from another
>>> dev on this, or if there's a better way we should bail out.
>>>
>>> diff --git a/src/couchdb/couch_view_group.erl
>>> b/src/couchdb/couch_view_group.erl
>>> index 97fc512..ab075bd 100644
>>> --- a/src/couchdb/couch_view_group.erl
>>> +++ b/src/couchdb/couch_view_group.erl
>>> @@ -469,6 +469,10 @@ open_index_file(RootDir, DbName, GroupSig) ->
>>> case couch_file:open(FileName) of
>>> {ok, Fd} -> {ok, Fd};
>>> {error, enoent} -> couch_file:open(FileName, [create]);
>>> + {error, emfile} ->
>>> + ?LOG_ERROR("Could not open file for view index: max open files
>>> reached. "
>>> + "Raise ERL_MAX_PORTS or system limits.", []),
>>> + throw({error, emfile});
>>> Error -> Error
>>> end.
>>
>>