[
https://issues.apache.org/jira/browse/MESOS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Mann reassigned MESOS-5988:
--------------------------------
Assignee: Greg Mann
> PollSocketImpl can write to a stale fd.
> ---------------------------------------
>
> Key: MESOS-5988
> URL: https://issues.apache.org/jira/browse/MESOS-5988
> Project: Mesos
> Issue Type: Bug
> Components: libprocess
> Reporter: Benjamin Mahler
> Assignee: Greg Mann
> Priority: Blocker
> Labels: mesosphere
> Fix For: 1.0.1
>
>
> When tracking down MESOS-5986 with [~greggomann] and [~anandmazumdar]. We
> were curious why PollSocketImpl avoids the same issue. It seems that
> PollSocketImpl has a similar race, however in the case of PollSocketImpl we
> will simply write to a stale file descriptor.
> One example is {{PollSocketImpl::send(const char*, size_t)}}:
> https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/poll_socket.cpp#L241-L245
> {code}
> Future<size_t> PollSocketImpl::send(const char* data, size_t size)
> {
> return io::poll(get(), io::WRITE)
> .then(lambda::bind(&internal::socket_send_data, get(), data, size));
> }
> Future<size_t> socket_send_data(int s, const char* data, size_t size)
> {
> CHECK(size > 0);
> while (true) {
> ssize_t length = send(s, data, size, MSG_NOSIGNAL);
> #ifdef __WINDOWS__
> int error = WSAGetLastError();
> #else
> int error = errno;
> #endif // __WINDOWS__
> if (length < 0 && net::is_restartable_error(error)) {
> // Interrupted, try again now.
> continue;
> } else if (length < 0 && net::is_retryable_error(error)) {
> // Might block, try again later.
> return io::poll(s, io::WRITE)
> .then(lambda::bind(&internal::socket_send_data, s, data, size));
> } else if (length <= 0) {
> // Socket error or closed.
> if (length < 0) {
> const string error = os::strerror(errno);
> VLOG(1) << "Socket error while sending: " << error;
> } else {
> VLOG(1) << "Socket closed while sending";
> }
> if (length == 0) {
> return length;
> } else {
> return Failure(ErrnoError("Socket send failed"));
> }
> } else {
> CHECK(length > 0);
> return length;
> }
> }
> }
> {code}
> If the last reference to the {{Socket}} goes away before the
> {{socket_send_data}} loop completes, then we will write to a stale fd!
> It turns out that we have avoided this issue because in libprocess we happen
> to keep a reference to the {{Socket}} around when sending:
> https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/process.cpp#L1678-L1707
> {code}
> void send(Encoder* encoder, Socket socket)
> {
> switch (encoder->kind()) {
> case Encoder::DATA: {
> size_t size;
> const char* data = static_cast<DataEncoder*>(encoder)->next(&size);
> socket.send(data, size)
> .onAny(lambda::bind(
> &internal::_send,
> lambda::_1,
> socket,
> encoder,
> size));
> break;
> }
> case Encoder::FILE: {
> off_t offset;
> size_t size;
> int fd = static_cast<FileEncoder*>(encoder)->next(&offset, &size);
> socket.sendfile(fd, offset, size)
> .onAny(lambda::bind(
> &internal::_send,
> lambda::_1,
> socket,
> encoder,
> size));
> break;
> }
> }
> }
> {code}
> However, this may not be true in all call-sites going forward. Currently, it
> appears that http::Connection can trigger this bug.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)