Andrew Onischuk created AMBARI-18464:
----------------------------------------

             Summary: Provide Warnings When ulimit Is High To Prevent Heartbeat 
Lost Issues
                 Key: AMBARI-18464
                 URL: https://issues.apache.org/jira/browse/AMBARI-18464
             Project: Ambari
          Issue Type: Bug
            Reporter: Andrew Onischuk
            Assignee: Andrew Onischuk
             Fix For: 3.0.0
         Attachments: AMBARI-18464.patch

Python's `Popen` constructor takes an optional argument called `close_fds`
which instructs Python to close all open file descriptors except for pipe,
stdout, and stderr. However, Python's logic chooses to iterate over all
possible open handles, not just those which are actually open.

  * With my `ulimit -n 1024`, `Popen` was taking ~2ms
  * With my `ulimit -n 1000000`, `Popen` was taking ~150ms

That's an increase of 7400%, and all I did was increase my ulimit. The number
of FDs opened was consistent.

In some environments, this `Popen` call can take between 6 and 60 seconds per
call. This leads to a problem where status commands cannot be drained fast
enough and results in the agent not responding to heartbeats and not running
commands.

This Jira serves two purposes:

  * Investigate our use of `close_fds` and determine if it's correct (or at 
least needs to be parameterized as a configuration option.
  * Provide a host check warning for the ulimit being too high.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to