[
https://issues.apache.org/jira/browse/MESOS-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840771#comment-15840771
]
Greg Mann commented on MESOS-6985:
----------------------------------
Yep, it's definitely occurring in {{::getenv}}. Here's the result of a failed
test run within {{gdb}}:
{code}
[ RUN ] MasterTest.MultipleExecutors
I0127 00:39:33.120487 1809 cluster.cpp:160] Creating default 'local' authorizer
I0127 00:39:33.122427 1815 master.cpp:383] Master
ac440d30-722b-43a5-9f61-cea98b3e576a (vagrant-ubuntu-trusty-64) started on
10.0.2.15:51845
I0127 00:39:33.122498 1815 master.cpp:385] Flags at startup: --acls=""
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins"
--allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate_agents="true" --authenticate_frameworks="true"
--authenticate_http_frameworks="true" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authenticators="crammd5"
--authorizers="local" --credentials="/tmp/b7WHq9/credentials"
--framework_sorter="drf" --help="false" --hostname_lookup="true"
--http_authenticators="basic" --http_framework_authenticators="basic"
--initialize_driver_logging="true" --log_auto_initialize="true"
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5"
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000"
--quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory"
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins"
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="100secs" --registry_strict="false"
--root_submissions="true" --user_sorter="drf" --version="false"
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/b7WHq9/master"
--zk_session_timeout="10secs"
I0127 00:39:33.122836 1815 master.cpp:435] Master only allowing authenticated
frameworks to register
I0127 00:39:33.122858 1815 master.cpp:449] Master only allowing authenticated
agents to register
I0127 00:39:33.122875 1815 master.cpp:462] Master only allowing authenticated
HTTP frameworks to register
I0127 00:39:33.122891 1815 credentials.hpp:37] Loading credentials for
authentication from '/tmp/b7WHq9/credentials'
I0127 00:39:33.123128 1815 master.cpp:507] Using default 'crammd5'
authenticator
I0127 00:39:33.123265 1815 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-master-readonly'
I0127 00:39:33.123394 1815 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-master-readwrite'
I0127 00:39:33.123631 1815 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-master-scheduler'
I0127 00:39:33.123884 1815 master.cpp:587] Authorization enabled
I0127 00:39:33.127008 1819 master.cpp:2119] Elected as the leading master!
I0127 00:39:33.127084 1819 master.cpp:1641] Recovering from registrar
I0127 00:39:33.127766 1818 registrar.cpp:362] Successfully fetched the
registry (0B) in 408832ns
I0127 00:39:33.127883 1818 registrar.cpp:461] Applied 1 operations in 22092ns;
attempting to update the registry
I0127 00:39:33.130798 1818 registrar.cpp:506] Successfully updated the
registry in 2.779136ms
I0127 00:39:33.130934 1818 registrar.cpp:392] Successfully recovered registrar
I0127 00:39:33.131573 1818 master.cpp:1757] Recovered 0 agents from the
registry (153B); allowing 10mins for agents to re-register
I0127 00:39:33.134503 1809 cluster.cpp:446] Creating default 'local' authorizer
I0127 00:39:33.135774 1818 slave.cpp:209] Mesos agent started on
(8)@10.0.2.15:51845
I0127 00:39:33.135824 1818 slave.cpp:210] Flags at startup: --acls=""
--appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true"
--authenticate_http_readwrite="true" --authenticatee="crammd5"
--authentication_backoff_factor="1secs" --authorizer="local"
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs"
--containerizers="mesos"
--credential="/tmp/MasterTest_MultipleExecutors_ruv9Vu/credential"
--default_role="*" --disk_watch_interval="1mins" --docker="docker"
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock"
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker"
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/MasterTest_MultipleExecutors_ruv9Vu/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --http_authenticators="basic"
--http_command_executor="false"
--http_credentials="/tmp/MasterTest_MultipleExecutors_ruv9Vu/http_credentials"
--http_heartbeat_interval="30secs" --image_provisioner_backend="copy"
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem"
--launcher="posix" --launcher_dir="/home/vagrant/src/mesos/build/src"
--logbufsecs="0" --logging_level="INFO"
--max_completed_executors_per_framework="150"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false"
--recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="10ms"
--resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true"
--runtime_dir="/tmp/MasterTest_MultipleExecutors_ruv9Vu"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true"
--systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/MasterTest_MultipleExecutors_1wuqbP"
I0127 00:39:33.136175 1818 credentials.hpp:86] Loading credential for
authentication from '/tmp/MasterTest_MultipleExecutors_ruv9Vu/credential'
I0127 00:39:33.136325 1818 slave.cpp:352] Agent using credential for:
test-principal
I0127 00:39:33.136358 1818 credentials.hpp:37] Loading credentials for
authentication from '/tmp/MasterTest_MultipleExecutors_ruv9Vu/http_credentials'
I0127 00:39:33.136541 1818 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-agent-readonly'
I0127 00:39:33.138916 1818 http.cpp:922] Using default 'basic' HTTP
authenticator for realm 'mesos-agent-readwrite'
I0127 00:39:33.142987 1818 slave.cpp:539] Agent resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0127 00:39:33.143088 1818 slave.cpp:547] Agent attributes: [ ]
I0127 00:39:33.143151 1818 slave.cpp:552] Agent hostname:
vagrant-ubuntu-trusty-64
I0127 00:39:33.143090 1809 sched.cpp:232] Version: 1.2.0
I0127 00:39:33.143712 1817 status_update_manager.cpp:177] Pausing sending
status updates
I0127 00:39:33.144261 1817 sched.cpp:336] New master detected at
[email protected]:51845
I0127 00:39:33.144701 1817 sched.cpp:407] Authenticating with master
[email protected]:51845
I0127 00:39:33.144754 1817 sched.cpp:414] Using default CRAM-MD5 authenticatee
I0127 00:39:33.144836 1819 state.cpp:60] Recovering state from
'/tmp/MasterTest_MultipleExecutors_1wuqbP/meta'
I0127 00:39:33.145293 1819 status_update_manager.cpp:203] Recovering status
update manager
I0127 00:39:33.145570 1814 authenticatee.cpp:121] Creating new client SASL
connection
I0127 00:39:33.146090 1814 master.cpp:6842] Authenticating
[email protected]:51845
I0127 00:39:33.146564 1817 slave.cpp:5422] Finished recovery
I0127 00:39:33.147352 1814 authenticator.cpp:98] Creating new server SASL
connection
I0127 00:39:33.148704 1815 authenticatee.cpp:213] Received SASL authentication
mechanisms: CRAM-MD5
I0127 00:39:33.149062 1815 authenticatee.cpp:239] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0127 00:39:33.149545 1815 authenticator.cpp:204] Received SASL authentication
start
I0127 00:39:33.150210 1815 authenticator.cpp:326] Authentication requires more
steps
I0127 00:39:33.152232 1815 authenticatee.cpp:259] Received SASL authentication
step
I0127 00:39:33.152844 1814 slave.cpp:929] New master detected at
[email protected]:51845
I0127 00:39:33.153264 1820 status_update_manager.cpp:177] Pausing sending
status updates
I0127 00:39:33.153064 1815 authenticator.cpp:232] Received SASL authentication
step
I0127 00:39:33.153442 1814 slave.cpp:964] Detecting new master
I0127 00:39:33.153686 1815 authenticator.cpp:318] Authentication success
I0127 00:39:33.154338 1813 authenticatee.cpp:299] Authentication success
I0127 00:39:33.154717 1818 master.cpp:6872] Successfully authenticated
principal 'test-principal' at
[email protected]:51845
I0127 00:39:33.155275 1814 sched.cpp:513] Successfully authenticated with
master [email protected]:51845
I0127 00:39:33.155483 1819 master.cpp:2707] Received SUBSCRIBE call for
framework 'default' at
[email protected]:51845
I0127 00:39:33.155555 1819 master.cpp:2155] Authorizing framework principal
'test-principal' to receive offers for role '*'
I0127 00:39:33.156003 1819 master.cpp:2783] Subscribing framework default with
checkpointing disabled and capabilities [ ]
I0127 00:39:33.156581 1814 hierarchical.cpp:271] Added framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.156581 1819 sched.cpp:759] Framework registered with
ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.163875 1818 slave.cpp:991] Authenticating with master
[email protected]:51845
I0127 00:39:33.163997 1818 slave.cpp:1002] Using default CRAM-MD5 authenticatee
I0127 00:39:33.164427 1818 authenticatee.cpp:121] Creating new client SASL
connection
I0127 00:39:33.164808 1818 master.cpp:6842] Authenticating
slave(8)@10.0.2.15:51845
I0127 00:39:33.165102 1818 authenticator.cpp:98] Creating new server SASL
connection
I0127 00:39:33.165536 1818 authenticatee.cpp:213] Received SASL authentication
mechanisms: CRAM-MD5
I0127 00:39:33.165603 1818 authenticatee.cpp:239] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0127 00:39:33.165796 1813 authenticator.cpp:204] Received SASL authentication
start
I0127 00:39:33.165879 1813 authenticator.cpp:326] Authentication requires more
steps
I0127 00:39:33.165999 1813 authenticatee.cpp:259] Received SASL authentication
step
I0127 00:39:33.166175 1816 authenticator.cpp:232] Received SASL authentication
step
I0127 00:39:33.166364 1816 authenticator.cpp:318] Authentication success
I0127 00:39:33.166671 1813 master.cpp:6872] Successfully authenticated
principal 'test-principal' at slave(8)@10.0.2.15:51845
I0127 00:39:33.166739 1816 authenticatee.cpp:299] Authentication success
I0127 00:39:33.167352 1817 slave.cpp:1086] Successfully authenticated with
master [email protected]:51845
I0127 00:39:33.167836 1816 master.cpp:5232] Registering agent at
slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64) with id
ac440d30-722b-43a5-9f61-cea98b3e576a-S0
I0127 00:39:33.168298 1816 registrar.cpp:461] Applied 1 operations in 62732ns;
attempting to update the registry
I0127 00:39:33.169097 1820 registrar.cpp:506] Successfully updated the
registry in 716032ns
I0127 00:39:33.170994 1813 master.cpp:5303] Registered agent
ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845
(vagrant-ubuntu-trusty-64) with cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000]
I0127 00:39:33.171192 1815 slave.cpp:1132] Registered with master
[email protected]:51845; given agent ID ac440d30-722b-43a5-9f61-cea98b3e576a-S0
I0127 00:39:33.173738 1814 status_update_manager.cpp:184] Resuming sending
status updates
I0127 00:39:33.174046 1815 slave.cpp:1198] Forwarding total oversubscribed
resources {}
I0127 00:39:33.174124 1817 hierarchical.cpp:478] Added agent
ac440d30-722b-43a5-9f61-cea98b3e576a-S0 (vagrant-ubuntu-trusty-64) with
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: {})
I0127 00:39:33.174309 1815 master.cpp:5710] Received update of agent
ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845
(vagrant-ubuntu-trusty-64) with total oversubscribed resources {}
I0127 00:39:33.176139 1817 hierarchical.cpp:548] Agent
ac440d30-722b-43a5-9f61-cea98b3e576a-S0 (vagrant-ubuntu-trusty-64) updated with
oversubscribed resources {} (total: cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000])
I0127 00:39:33.176378 1814 master.cpp:6671] Sending 1 offers to framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000 (default) at
[email protected]:51845
I0127 00:39:33.178370 1818 master.cpp:3661] Processing ACCEPT call for offers:
[ ac440d30-722b-43a5-9f61-cea98b3e576a-O0 ] on agent
ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845
(vagrant-ubuntu-trusty-64) for framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000 (default) at
[email protected]:51845
I0127 00:39:33.178455 1818 master.cpp:3249] Authorizing framework principal
'test-principal' to launch task 1
I0127 00:39:33.178591 1818 master.cpp:3249] Authorizing framework principal
'test-principal' to launch task 2
W0127 00:39:33.181143 1814 validation.cpp:995] Executor 'executor-1' for task
'1' uses less CPUs (None) than the minimum required (0.01). Please update your
executor, as this will be mandatory in future releases.
W0127 00:39:33.181447 1814 validation.cpp:1007] Executor 'executor-1' for task
'1' uses less memory (None) than the minimum required (32MB). Please update
your executor, as this will be mandatory in future releases.
I0127 00:39:33.181901 1814 master.cpp:8584] Adding task 1 with resources
cpus(*):1; mem(*):512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at
slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64)
I0127 00:39:33.182237 1814 master.cpp:4311] Launching task 1 of framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000 (default) at
[email protected]:51845 with resources
cpus(*):1; mem(*):512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at
slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64)
I0127 00:39:33.182725 1815 slave.cpp:1576] Got assigned task '1' for framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000
W0127 00:39:33.183140 1814 validation.cpp:995] Executor 'executor-2' for task
'2' uses less CPUs (None) than the minimum required (0.01). Please update your
executor, as this will be mandatory in future releases.
W0127 00:39:33.183409 1814 validation.cpp:1007] Executor 'executor-2' for task
'2' uses less memory (None) than the minimum required (32MB). Please update
your executor, as this will be mandatory in future releases.
I0127 00:39:33.183221 1815 slave.cpp:1736] Launching task '1' for framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.184008 1815 paths.cpp:547] Trying to chown
'/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-1/runs/d1f9a0da-39af-4264-8679-6feeb54a9bd2'
to user 'vagrant'
I0127 00:39:33.184008 1814 master.cpp:8584] Adding task 2 with resources
cpus(*):1; mem(*):512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at
slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64)
I0127 00:39:33.184370 1815 slave.cpp:6350] Launching executor 'executor-1' of
framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000 with resources {} in work
directory
'/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-1/runs/d1f9a0da-39af-4264-8679-6feeb54a9bd2'
I0127 00:39:33.184882 1814 master.cpp:4311] Launching task 2 of framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000 (default) at
[email protected]:51845 with resources
cpus(*):1; mem(*):512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at
slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64)
I0127 00:39:33.185616 1815 slave.cpp:2058] Queued task '1' for executor
'executor-1' of framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.185811 1815 slave.cpp:1576] Got assigned task '2' for framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.186208 1815 slave.cpp:1736] Launching task '2' for framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.186472 1815 paths.cpp:547] Trying to chown
'/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-2/runs/f1c7564c-d22a-4609-942c-b53f77061d99'
to user 'vagrant'
I0127 00:39:33.187806 1815 slave.cpp:6350] Launching executor 'executor-2' of
framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000 with resources {} in work
directory
'/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-2/runs/f1c7564c-d22a-4609-942c-b53f77061d99'
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe711d700 (LWP 1815)]
__GI_getenv (name=0x7fffc0064e6a "BPROCESS_IP") at getenv.c:85
85 getenv.c: No such file or directory.
(gdb) inf locals
ep_start = <error reading variable ep_start (Cannot access memory at address
0x110)>
len = 11
ep = 0x2da66c0
name_start = 18764
(gdb) bt
#0 __GI_getenv (name=0x7fffc0064e6a "BPROCESS_IP") at getenv.c:85
#1 0x0000000000affbce in os::getenv ()
#2 0x00007ffff5a8fe91 in mesos::internal::slave::executorEnvironment () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#3 0x00007ffff5a8ad9a in mesos::internal::slave::Framework::launchExecutor ()
from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#4 0x00007ffff5a65a47 in mesos::internal::slave::Slave::_run () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#5 0x00007ffff5abdc0d in void process::dispatch<mesos::internal::slave::Slave,
process::Future<bool> const&, mesos::FrameworkInfo const&, mesos::ExecutorInfo
const&, Option<mesos::TaskInfo> const&, Option<mesos::TaskGroupInfo> const&,
process::Future<bool>, mesos::FrameworkInfo, mesos::ExecutorInfo,
Option<mesos::TaskInfo>, Option<mesos::TaskGroupInfo>
>(process::PID<mesos::internal::slave::Slave> const&, void
(mesos::internal::slave::Slave::*)(process::Future<bool> const&,
mesos::FrameworkInfo const&, mesos::ExecutorInfo const&,
Option<mesos::TaskInfo> const&, Option<mesos::TaskGroupInfo> const&),
process::Future<bool>, mesos::FrameworkInfo, mesos::ExecutorInfo,
Option<mesos::TaskInfo>,
Option<mesos::TaskGroupInfo>)::{lambda(process::ProcessBase*)#1}::operator()(process::ProcessBase*)
const () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#6 0x00007ffff5af1de9 in std::_Function_handler<void (process::ProcessBase*),
void process::dispatch<mesos::internal::slave::Slave, process::Future<bool>
const&, mesos::FrameworkInfo const&, mesos::ExecutorInfo const&,
Option<mesos::TaskInfo> const&, Option<mesos::TaskGroupInfo> const&,
process::Future<bool>, mesos::FrameworkInfo, mesos::ExecutorInfo,
Option<mesos::TaskInfo>, Option<mesos::TaskGroupInfo>
>(process::PID<mesos::internal::slave::Slave> const&, void
(mesos::internal::slave::Slave::*)(process::Future<bool> const&,
mesos::FrameworkInfo const&, mesos::ExecutorInfo const&,
Option<mesos::TaskInfo> const&, Option<mesos::TaskGroupInfo> const&),
process::Future<bool>, mesos::FrameworkInfo, mesos::ExecutorInfo,
Option<mesos::TaskInfo>,
Option<mesos::TaskGroupInfo>)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#7 0x00007ffff67e3a2b in std::function<void
(process::ProcessBase*)>::operator()(process::ProcessBase*) const () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#8 0x00007ffff67c982d in process::ProcessBase::visit () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#9 0x00007ffff67d40ac in process::DispatchEvent::visit () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#10 0x0000000000ad3f14 in process::ProcessBase::serve ()
#11 0x00007ffff67c5b1a in process::ProcessManager::resume () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#12 0x00007ffff67c235e in operator() () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#13 0x00007ffff67d37e6 in _M_invoke<>(void) () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#14 0x00007ffff67d373d in operator() () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#15 0x00007ffff67d36d6 in _M_run () from
/home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#16 0x00007ffff096ea60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007ffff018b184 in start_thread (arg=0x7fffe711d700) at
pthread_create.c:312
#18 0x00007fffefeb837d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{code}
If we look at {{getenv.c}}, we find the following:
{code}
26 /* Return the value of the environment variable NAME. This implementation
27 is tuned a bit in that it assumes no environment variable has an empty
28 name which of course should always be true. We have a special case for
29 one character names so that for the general case we can assume at least
30 two characters which we can access. By doing this we can avoid using the
31 `strncmp' most of the time. */
32 char *
33 getenv (name)
34 const char *name;
35 {
36 size_t len = strlen (name);
37 char **ep;
38 uint16_t name_start;
39
40 if (__environ == NULL || name[0] == '\0')
41 return NULL;
42
43 if (name[1] == '\0')
44 {
45 /* The name of the variable consists of only one character. Therefore
46 the first two characters of the environment entry are this character
47 and a '=' character. */
48 #if __BYTE_ORDER == __LITTLE_ENDIAN || !_STRING_ARCH_unaligned
49 name_start = ('=' << 8) | *(const unsigned char *) name;
50 #else
51 name_start = '=' | ((*(const unsigned char *) name) << 8);
52 #endif
53 for (ep = __environ; *ep != NULL; ++ep)
54 {
55 #if _STRING_ARCH_unaligned
56 uint16_t ep_start = *(uint16_t *) *ep;
57 #else
58 uint16_t ep_start = (((unsigned char *) *ep)[0]
59 | (((unsigned char *) *ep)[1] << 8));
60 #endif
61 if (name_start == ep_start)
62 return &(*ep)[2];
63 }
64 }
65 else
66 {
67 #if _STRING_ARCH_unaligned
68 name_start = *(const uint16_t *) name;
69 #else
70 name_start = (((const unsigned char *) name)[0]
71 | (((const unsigned char *) name)[1] << 8));
72 #endif
73 len -= 2;
74 name += 2;
75
76 for (ep = __environ; *ep != NULL; ++ep)
77 {
78 #if _STRING_ARCH_unaligned
79 uint16_t ep_start = *(uint16_t *) *ep;
80 #else
81 uint16_t ep_start = (((unsigned char *) *ep)[0]
82 | (((unsigned char *) *ep)[1] << 8));
83 #endif
84
85 if (name_start == ep_start && !strncmp (*ep + 2, name, len)
86 && (*ep)[len + 2] == '=')
87 return &(*ep)[len + 3];
88 }
89 }
90
91 return NULL;
92 }
93 libc_hidden_def (getenv)
{code}
Sure enough, at line 85 we are attempting to read {{ep_start}}, which is
pointing to a place in memory somewhere in the array pointed to by the global
{{__environ}}. When we create a subprocess, we pass {{char** envp}} directly
from the parent process into the cloned process, and then temporarily reassign
the child process's {{environ}} pointer while we perform {{execvp}}:
{code}
inline int execvpe(const char* file, char** argv, char** envp)
{
char** saved = os::raw::environment();
*os::raw::environmentp() = envp;
int result = execvp(file, argv);
*os::raw::environmentp() = saved;
return result;
}
{code}
> os::getenv() can segfault
> -------------------------
>
> Key: MESOS-6985
> URL: https://issues.apache.org/jira/browse/MESOS-6985
> Project: Mesos
> Issue Type: Bug
> Components: stout
> Environment: ASF CI, Ubuntu 14.04 and CentOS 7 both with and without
> libevent/SSL
> Reporter: Greg Mann
> Labels: stout
> Attachments:
> MasterMaintenanceTest.InverseOffersFilters-truncated.txt,
> MasterTest.MultipleExecutors.txt
>
>
> This was observed on ASF CI. The segfault first showed up on CI on 9/20/16
> and has been produced by the tests {{MasterTest.MultipleExecutors}} and
> {{MasterMaintenanceTest.InverseOffersFilters}}. In both cases,
> {{os::getenv()}} segfaults with the same stack trace:
> {code}
> *** Aborted at 1485241617 (unix time) try "date -d @1485241617" if you are
> using GNU date ***
> PC: @ 0x2ad59e3ae82d (unknown)
> I0124 07:06:57.422080 28619 exec.cpp:162] Version: 1.2.0
> *** SIGSEGV (@0xf0) received by PID 28591 (TID 0x2ad5a7b87700) from PID 240;
> stack trace: ***
> I0124 07:06:57.422336 28615 exec.cpp:212] Executor started at:
> executor(75)@172.17.0.2:45752 with pid 28591
> @ 0x2ad5ab953197 (unknown)
> @ 0x2ad5ab957479 (unknown)
> @ 0x2ad59e165330 (unknown)
> @ 0x2ad59e3ae82d (unknown)
> @ 0x2ad594631358 os::getenv()
> @ 0x2ad59aba6acf mesos::internal::slave::executorEnvironment()
> @ 0x2ad59ab845c0 mesos::internal::slave::Framework::launchExecutor()
> @ 0x2ad59ab818a2 mesos::internal::slave::Slave::_run()
> @ 0x2ad59ac1ec10
> _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1_8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_
> @ 0x2ad59ac1e6bf
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13FrameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskGroupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2ad59bce2304 std::function<>::operator()()
> @ 0x2ad59bcc9824 process::ProcessBase::visit()
> @ 0x2ad59bd4028e process::DispatchEvent::visit()
> @ 0x2ad594616df1 process::ProcessBase::serve()
> @ 0x2ad59bcc72b7 process::ProcessManager::resume()
> @ 0x2ad59bcd567c
> process::ProcessManager::init_threads()::$_2::operator()()
> @ 0x2ad59bcd5585
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_2vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x2ad59bcd5555 std::_Bind_simple<>::operator()()
> @ 0x2ad59bcd552c std::thread::_Impl<>::_M_run()
> @ 0x2ad59d9e6a60 (unknown)
> @ 0x2ad59e15d184 start_thread
> @ 0x2ad59e46d37d (unknown)
> make[4]: *** [check-local] Segmentation fault
> {code}
> Find attached the full log from a failed run of
> {{MasterTest.MultipleExecutors}} and a truncated log from a failed run of
> {{MasterMaintenanceTest.InverseOffersFilters}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)