Arun M J created MESOS-10203:
--------------------------------
Summary: Agent process crashes on newer linux kernels if
'linux/capabilities' isolation is enbaled
Key: MESOS-10203
URL: https://issues.apache.org/jira/browse/MESOS-10203
Project: Mesos
Issue Type: Bug
Components: agent
Reporter: Arun M J
Mesos agent crashes with following stack trace on newer Linux kernels (>=5.8.x)
if started with MESOS_ISOLATION=linux/capabilities.
Tested on {color:#5454ff}5.7.19{color} where it was running fine, but fails on
{color:#000000}5.8.18{color},{color:#000000}5.9.11 {color}and
{color:#000000}5.10{color}
{quote}{{Dec 13 05:08:28 mesosbox mesos-agent[465]: sh: hadoop: command not
found}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: I1213 05:08:28.234824 458
fetcher.cpp:66] Skipping URI fetcher plugin 'hadoop' as it could not be
created: Failed to create HDFS client: Hadoop client is not available, exit
status: 32512}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: Reached unreachable statement at
linux/capabilities.cpp:497}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: *** Aborted at 1607836108 (unix
time) try "date -d @1607836108" if you are using GNU date ***}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: PC: @ 0x7f875bd62387 __GI_raise}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: *** SIGABRT (@0x1ca) received by
PID 458 (TID 0x7f8760ddca00) from PID 458; stack trace: ***}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875c626630 (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875bd62387 __GI_raise}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875bd63a78 __GI_abort}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875e60f237 (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ef6e7c1 (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ef723cc (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ef70c96 (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875f05389d (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ed837fc (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ed72332 (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ecf54c6 (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x55f5d9c1a256 (unknown)}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875bd4e555
__libc_start_main}}
{{Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x55f5d9c1d10f (unknown)}}
{{Dec 13 05:08:28 mesosbox kernel: audit: type=1701 audit(1607836108.250:274):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=4772
comm="mesos-agent" exe="/usr/sbin/mesos-agent" sig=6 res=1}}
{quote}
When looked further, I could find out that this was raised from
[linux/capabilities.cpp|https://github.com/apache/mesos/blob/206da612c0aada0b1d86beb63660d9083b774894/src/linux/capabilities.cpp#L495-L502]
which converts capability enum values to human-readable names.
{code:java}
ostream& operator<<(ostream& stream, const Capability& capability)
{
switch (capability) {
case CHOWN: return stream << "CHOWN";
case DAC_OVERRIDE: return stream << "DAC_OVERRIDE";
case AUDIT_READ: return stream << "AUDIT_READ";
...
...
case MAX_CAPABILITY: UNREACHABLE(); // !!! Crash site
}
UNREACHABLE();
}
{code}
[MAX_CAPABILITY|https://github.com/apache/mesos/blob/206da612c0aada0b1d86beb63660d9083b774894/src/linux/capabilities.hpp#L75]
is defined as *38*. But as of now, the new capabilities were introduced to
Linux. Namely,
* *CAP_PERFMON*=38 // (since Linux 5.8) - Employ various
performance-monitoring mechanisms
* *CAP_BPF*=39 // (since Linux 5.8) - Employ privileged BPF
operations;
* *CAP_CHECKPOINT_RESTORE*=40 ** (since Linux 5.9) - Allow
checkpoint/restore related operations
ref:
[https://github.com/torvalds/linux/blob/master/include/uapi/linux/capability.h]
Above Mesos code does not seem to respect such kernel evolutions. So adding new
capability on Kernel will break the Isolator.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)