Benjamin Mahler created MESOS-5988:
--------------------------------------
Summary: PollSocketImpl can write to a stale fd.
Key: MESOS-5988
URL: https://issues.apache.org/jira/browse/MESOS-5988
Project: Mesos
Issue Type: Bug
Components: libprocess
Reporter: Benjamin Mahler
Priority: Blocker
Fix For: 1.0.1
When tracking down MESOS-5986 with [~greggomann] and [~anandmazumdar]. We were
curious why PollSocketImpl avoids the same issue. It seems that PollSocketImpl
has a similar race, however in the case of PollSocketImpl we will simply write
to a stale file descriptor.
One example is {{PollSocketImpl::send(const char*, size_t)}}:
https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/poll_socket.cpp#L241-L245
{code}
Future<size_t> PollSocketImpl::send(const char* data, size_t size)
{
return io::poll(get(), io::WRITE)
.then(lambda::bind(&internal::socket_send_data, get(), data, size));
}
Future<size_t> socket_send_data(int s, const char* data, size_t size)
{
CHECK(size > 0);
while (true) {
ssize_t length = send(s, data, size, MSG_NOSIGNAL);
#ifdef __WINDOWS__
int error = WSAGetLastError();
#else
int error = errno;
#endif // __WINDOWS__
if (length < 0 && net::is_restartable_error(error)) {
// Interrupted, try again now.
continue;
} else if (length < 0 && net::is_retryable_error(error)) {
// Might block, try again later.
return io::poll(s, io::WRITE)
.then(lambda::bind(&internal::socket_send_data, s, data, size));
} else if (length <= 0) {
// Socket error or closed.
if (length < 0) {
const string error = os::strerror(errno);
VLOG(1) << "Socket error while sending: " << error;
} else {
VLOG(1) << "Socket closed while sending";
}
if (length == 0) {
return length;
} else {
return Failure(ErrnoError("Socket send failed"));
}
} else {
CHECK(length > 0);
return length;
}
}
}
{code}
If the last reference to the {{Socket}} goes away before the
{{socket_send_data}} loop completes, then we will write to a stale fd!
It turns out that we have avoided this issue because in libprocess we happen to
keep a reference to the {{Socket}} around when sending:
https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/process.cpp#L1678-L1707
{code}
void send(Encoder* encoder, Socket socket)
{
switch (encoder->kind()) {
case Encoder::DATA: {
size_t size;
const char* data = static_cast<DataEncoder*>(encoder)->next(&size);
socket.send(data, size)
.onAny(lambda::bind(
&internal::_send,
lambda::_1,
socket,
encoder,
size));
break;
}
case Encoder::FILE: {
off_t offset;
size_t size;
int fd = static_cast<FileEncoder*>(encoder)->next(&offset, &size);
socket.sendfile(fd, offset, size)
.onAny(lambda::bind(
&internal::_send,
lambda::_1,
socket,
encoder,
size));
break;
}
}
}
{code}
However, this may not be true in all call-sites going forward. Currently, it
appears that http::Connection can trigger this bug.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)