Andrew Onischuk created AMBARI-18464:
----------------------------------------
Summary: Provide Warnings When ulimit Is High To Prevent Heartbeat
Lost Issues
Key: AMBARI-18464
URL: https://issues.apache.org/jira/browse/AMBARI-18464
Project: Ambari
Issue Type: Bug
Reporter: Andrew Onischuk
Assignee: Andrew Onischuk
Fix For: 3.0.0
Attachments: AMBARI-18464.patch
Python's `Popen` constructor takes an optional argument called `close_fds`
which instructs Python to close all open file descriptors except for pipe,
stdout, and stderr. However, Python's logic chooses to iterate over all
possible open handles, not just those which are actually open.
* With my `ulimit -n 1024`, `Popen` was taking ~2ms
* With my `ulimit -n 1000000`, `Popen` was taking ~150ms
That's an increase of 7400%, and all I did was increase my ulimit. The number
of FDs opened was consistent.
In some environments, this `Popen` call can take between 6 and 60 seconds per
call. This leads to a problem where status commands cannot be drained fast
enough and results in the agent not responding to heartbeats and not running
commands.
This Jira serves two purposes:
* Investigate our use of `close_fds` and determine if it's correct (or at
least needs to be parameterized as a configuration option.
* Provide a host check warning for the ulimit being too high.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)