[ 
https://issues.apache.org/jira/browse/MESOS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390182#comment-15390182
 ] 

Joseph Wu commented on MESOS-5882:
----------------------------------

Here's a list of callsites for {{os::cloexec}} and a summary of why 
{{os::cloexec}} matters in the callsite:
|| Location || Reason ||
| 
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/subprocess.cpp#L140-L144
 | When we open a PIPE to a subprocess, those pipes are kept open in the 
parent.  We need CLOEXEC to prevent future subprocesses from inheriting these 
incorrectly. |
| 
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/io.cpp#L264
  
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/io.cpp#L419
  
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/io.cpp#L476
  
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/io.cpp#L535
 | {{io::peek}}, {{io::read}}, {{io::write}}, and {{io::redirect}} duplicate an 
FD to control the FD lifecycle asynchronously.  We CLOEXEC to prevent FD leaks. 
|
| 
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/socket.cpp#L62
 | Sockets will CLOEXEC on create to prevent leaks. |
| 
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L490
  
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L827
 | SSL sockets CLOEXEC to prevent leaks.  This happens on socket create 
(because we duplicate the FD to control lifetime separately) and accept. |
| 
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/poll_socket.cpp#L69
 | Poll sockets CLOEXEC to prevent leaks.  This happens on accept. |
| 
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/stout/include/stout/os/posix/write.hpp#L66-L69
 | A {{os::write}} helper CLOEXECs FDs to prevent leaks. |
| 
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/stout/include/stout/os/open.hpp#L63-L81
 | {{os::open}} has some CLOEXEC logic, but doesn't CLOEXEC by default. |
| All calls to {{os::open}} except in:
  
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/stout/include/stout/os/sunos.hpp
  
https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/stout/include/stout/os/touch.hpp#L34
  A couple tests | Except for some Solaris code (possibly not supported 
anymore) and {{os::touch}} (which opens and immediately closes an FD; this 
appears to be a possible leak), we CLOEXEC everywhere outside of tests to 
prevent leaks. |

> `os::cloexec` does not exist on Windows
> ---------------------------------------
>
>                 Key: MESOS-5882
>                 URL: https://issues.apache.org/jira/browse/MESOS-5882
>             Project: Mesos
>          Issue Type: Bug
>          Components: stout
>            Reporter: Alex Clemmer
>            Assignee: Alex Clemmer
>              Labels: mesosphere, stout
>
> `os::cloexec` does not work on Windows. It will never work at the OS level. 
> Because of this, there are likely many important and hard-to-detect bugs 
> hanging around the agent.
> This is extremely important to fix. Some possible solutions to investigate 
> (some of which are _extremely_ risky):
> * Abstract out file descriptors into a class, implement cloexec in that class 
> on Windows (since we can't rely on the OS to do it).
> * Refactor all the code that relies on `os::cloexec` to not rely on it.
> Of the two, the first seems less risky in the short term, because the cloexec 
> code only affects Windows. Depending on the semantics of the implementation 
> of the `FileDescriptor` class, it is possible that this is riskier to Windows 
> in the longer term, as the semantics of `cloexec` may have subtle difference 
> between Linux and Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to