[
https://issues.apache.org/jira/browse/MESOS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135602#comment-16135602
]
Vinod Kone commented on MESOS-7218:
-----------------------------------
Saw a slightly different bug (double free corruption) for this test in ASF CI.
{code}
[ RUN ] ExamplesTest.PythonFramework
Using temporary directory '/tmp/ExamplesTest_PythonFramework_z8oLwZ'
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0818 23:26:03.783149 9113 process.cpp:1393] libprocess is initialized on
172.17.0.2:45305 with 16 worker threads
I0818 23:26:03.783207 9113 logging.cpp:199] Logging to STDERR
I0818 23:26:03.973633 9113 leveldb.cpp:174] Opened db in 186.110939ms
I0818 23:26:04.006614 9113 leveldb.cpp:181] Compacted db in 32.951872ms
I0818 23:26:04.006680 9113 leveldb.cpp:196] Created db iterator in 20157ns
I0818 23:26:04.006700 9113 leveldb.cpp:202] Seeked to beginning of db in 8574ns
I0818 23:26:04.006712 9113 leveldb.cpp:271] Iterated through 0 keys in the db
in 6739ns
I0818 23:26:04.006757 9113 replica.cpp:779] Replica recovered with log
positions 0 -> 0 with 1 holes and 0 unlearned
I0818 23:26:04.007972 9132 recover.cpp:451] Starting replica recovery
I0818 23:26:04.008108 9132 recover.cpp:477] Replica is in EMPTY status
I0818 23:26:04.008359 9113 local.cpp:272] Creating default 'local' authorizer
I0818 23:26:04.008885 9130 replica.cpp:676] Replica in EMPTY status received a
broadcasted recover request from __req_res__(1)@172.17.0.2:45305
I0818 23:26:04.010182 9131 recover.cpp:197] Received a recover response from a
replica in EMPTY status
I0818 23:26:04.010478 9127 recover.cpp:568] Updating replica status to STARTING
I0818 23:26:04.011464 9139 master.cpp:442] Master
82ba1c51-64a2-4202-9dac-081a35935d4e (8ca4e8552e66) started on 172.17.0.2:45305
I0818 23:26:04.011514 9139 master.cpp:444] Flags at startup:
--acls="permissive: false
register_frameworks {
principals {
type: SOME
values: "test-principal"
}
roles {
type: SOME
values: "*"
}
}
run_tasks {
principals {
type: SOME
values: "test-principal"
}
users {
type: SOME
values: "mesos"
}
}
register_agents {
principals {
type: ANY
}
agents {
type: ANY
}
}
" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins"
--allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate_agents="false" --authenticate_frameworks="true"
--authenticate_http_frameworks="false" --authenticate_http_readonly="false"
--authenticate_http_readwrite="false" --authenticators="crammd5"
--authorizers="local"
--credentials="/tmp/ExamplesTest_PythonFramework_z8oLwZ/credentials"
--filter_gpu_resources="true" --framework_sorter="drf" --help="false"
--hostname_lookup="true" --http_authenticators="basic"
--initialize_driver_logging="true" --log_auto_initialize="true"
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5"
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000"
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false"
--recovery_agent_removal_limit="100%" --registry="replicated_log"
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins"
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="20secs" --registry_strict="false"
--root_submissions="true" --user_sorter="drf" --version="false"
--webui_dir="/mesos/mesos-1.4.0/src/webui"
--work_dir="/tmp/mesos-2Nvz1X/master" --zk_session_timeout="10secs"
I0818 23:26:04.012109 9139 master.cpp:494] Master only allowing authenticated
frameworks to register
I0818 23:26:04.012122 9139 master.cpp:510] Master allowing unauthenticated
agents to register
I0818 23:26:04.012131 9139 master.cpp:524] Master allowing HTTP frameworks to
register without authentication
I0818 23:26:04.012143 9139 credentials.hpp:37] Loading credentials for
authentication from '/tmp/ExamplesTest_PythonFramework_z8oLwZ/credentials'
W0818 23:26:04.012240 9139 credentials.hpp:52] Permissions on credentials file
'/tmp/ExamplesTest_PythonFramework_z8oLwZ/credentials' are too open; it is
recommended that your credentials file is NOT accessible by others
I0818 23:26:04.012315 9139 master.cpp:566] Using default 'crammd5'
authenticator
I0818 23:26:04.012372 9139 authenticator.cpp:520] Initializing server SASL
I0818 23:26:04.014071 9139 auxprop.cpp:73] Initialized in-memory auxiliary
property plugin
I0818 23:26:04.014174 9139 master.cpp:646] Authorization enabled
I0818 23:26:04.014312 9134 hierarchical.cpp:171] Initialized hierarchical
allocator process
I0818 23:26:04.014315 9127 whitelist_watcher.cpp:77] No whitelist given
I0818 23:26:04.015699 9113 resolver.cpp:69] Creating default secret resolver
I0818 23:26:04.016268 9113 containerizer.cpp:246] Using isolation:
filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret
W0818 23:26:04.016788 9113 backend.cpp:76] Failed to create 'aufs' backend:
AufsBackend requires root privileges
W0818 23:26:04.016878 9113 backend.cpp:76] Failed to create 'bind' backend:
BindBackend requires root privileges
I0818 23:26:04.016914 9113 provisioner.cpp:255] Using default backend 'copy'
I0818 23:26:04.019286 9140 slave.cpp:250] Mesos agent started on
(1)@172.17.0.2:45305
I0818 23:26:04.019330 9140 slave.cpp:251] Flags at startup:
--acls="permissive: false
register_frameworks {
principals {
type: SOME
values: "test-principal"
}
roles {
type: SOME
values: "*"
}
}
run_tasks {
principals {
type: SOME
values: "test-principal"
}
users {
type: SOME
values: "mesos"
}
}
register_agents {
principals {
type: ANY
}
agents {
type: ANY
}
}
" --appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false"
--authenticate_http_readwrite="false" --authenticatee="crammd5"
--authentication_backoff_factor="1secs" --authorizer="local"
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs"
--containerizers="mesos" --default_role="*"
--disallow_sharing_agent_pid_namespace="false" --disk_watch_interval="1mins"
--docker="docker" --docker_kill_orphans="true"
--docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs"
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
--docker_store_dir="/tmp/mesos/store/docker"
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_reregistration_timeout="2secs"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/mesos-2Nvz1X/agents/0/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --http_command_executor="false"
--http_heartbeat_interval="30secs" --initialize_driver_logging="true"
--isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix"
--launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0"
--logging_level="INFO" --max_completed_executors_per_framework="150"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs" --resources="cpus:2;mem:10240"
--revocable_cpu_low_priority="true"
--runtime_dir="/tmp/mesos-2Nvz1X/agents/0/run"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true"
--systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/mesos-2Nvz1X/agents/0/work"
I0818 23:26:04.021750 9113 resolver.cpp:69] Creating default secret resolver
I0818 23:26:04.022028 9113 containerizer.cpp:246] Using isolation:
filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret
I0818 23:26:04.022400 9135 master.cpp:2163] Elected as the leading master!
I0818 23:26:04.022433 9135 master.cpp:1702] Recovering from registrar
W0818 23:26:04.022450 9113 backend.cpp:76] Failed to create 'aufs' backend:
AufsBackend requires root privileges
W0818 23:26:04.022500 9113 backend.cpp:76] Failed to create 'bind' backend:
BindBackend requires root privileges
I0818 23:26:04.022518 9136 registrar.cpp:347] Recovering registrar
I0818 23:26:04.022526 9113 provisioner.cpp:255] Using default backend 'copy'
I0818 23:26:04.024107 9138 slave.cpp:250] Mesos agent started on
(2)@172.17.0.2:45305
I0818 23:26:04.024148 9138 slave.cpp:251] Flags at startup:
--acls="permissive: false
register_frameworks {
principals {
type: SOME
values: "test-principal"
}
roles {
type: SOME
values: "*"
}
}
run_tasks {
principals {
type: SOME
values: "test-principal"
}
users {
type: SOME
values: "mesos"
}
}
register_agents {
principals {
type: ANY
}
agents {
type: ANY
}
}
" --appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false"
--authenticate_http_readwrite="false" --authenticatee="crammd5"
--authentication_backoff_factor="1secs" --authorizer="local"
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs"
--containerizers="mesos" --default_role="*"
--disallow_sharing_agent_pid_namespace="false" --disk_watch_interval="1mins"
--docker="docker" --docker_kill_orphans="true"
--docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs"
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
--docker_store_dir="/tmp/mesos/store/docker"
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_reregistration_timeout="2secs"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/mesos-2Nvz1X/agents/1/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --http_command_executor="false"
--http_heartbeat_interval="30secs" --initialize_driver_logging="true"
--isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix"
--launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0"
--logging_level="INFO" --max_completed_executors_per_framework="150"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs" --resources="cpus:2;mem:10240"
--revocable_cpu_low_priority="true"
--runtime_dir="/tmp/mesos-2Nvz1X/agents/1/run"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true"
--systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/mesos-2Nvz1X/agents/1/work"
I0818 23:26:04.026486 9113 resolver.cpp:69] Creating default secret resolver
I0818 23:26:04.026756 9113 containerizer.cpp:246] Using isolation:
filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret
W0818 23:26:04.027266 9113 backend.cpp:76] Failed to create 'aufs' backend:
AufsBackend requires root privileges
W0818 23:26:04.027323 9113 backend.cpp:76] Failed to create 'bind' backend:
BindBackend requires root privileges
I0818 23:26:04.027364 9113 provisioner.cpp:255] Using default backend 'copy'
I0818 23:26:04.032902 9133 slave.cpp:250] Mesos agent started on
(3)@172.17.0.2:45305
I0818 23:26:04.033041 9133 slave.cpp:251] Flags at startup:
--acls="permissive: false
register_frameworks {
principals {
type: SOME
values: "test-principal"
}
roles {
type: SOME
values: "*"
}
}
run_tasks {
principals {
type: SOME
values: "test-principal"
}
users {
type: SOME
values: "mesos"
}
}
register_agents {
principals {
type: ANY
}
agents {
type: ANY
}
}
" --appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false"
--authenticate_http_readwrite="false" --authenticatee="crammd5"
--authentication_backoff_factor="1secs" --authorizer="local"
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs"
--containerizers="mesos" --default_role="*"
--disallow_sharing_agent_pid_namespace="false" --disk_watch_interval="1mins"
--docker="docker" --docker_kill_orphans="true"
--docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs"
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
--docker_store_dir="/tmp/mesos/store/docker"
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_reregistration_timeout="2secs"
--executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/mesos-2Nvz1X/agents/2/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks"
--gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --http_command_executor="false"
--http_heartbeat_interval="30secs" --initialize_driver_logging="true"
--isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix"
--launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0"
--logging_level="INFO" --max_completed_executors_per_framework="150"
--oversubscribed_resources_interval="15secs" --perf_duration="10secs"
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
--quiet="false" --recover="reconnect" --recovery_timeout="15mins"
--registration_backoff_factor="1secs" --resources="cpus:2;mem:10240"
--revocable_cpu_low_priority="true"
--runtime_dir="/tmp/mesos-2Nvz1X/agents/2/run"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true"
--systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false"
--work_dir="/tmp/mesos-2Nvz1X/agents/2/work"
I0818 23:26:04.020632 9140 slave.cpp:565] Agent resources:
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":10240.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0818 23:26:04.025518 9138 slave.cpp:565] Agent resources:
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":10240.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0818 23:26:04.034517 9138 slave.cpp:573] Agent attributes: [ ]
I0818 23:26:04.034548 9138 slave.cpp:582] Agent hostname: 8ca4e8552e66
I0818 23:26:04.034569 9140 slave.cpp:573] Agent attributes: [ ]
I0818 23:26:04.034377 9133 slave.cpp:565] Agent resources:
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":10240.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0818 23:26:04.034627 9135 status_update_manager.cpp:177] Pausing sending
status updates
I0818 23:26:04.034641 9133 slave.cpp:573] Agent attributes: [ ]
I0818 23:26:04.034626 9140 slave.cpp:582] Agent hostname: 8ca4e8552e66
I0818 23:26:04.034656 9133 slave.cpp:582] Agent hostname: 8ca4e8552e66
I0818 23:26:04.034734 9129 status_update_manager.cpp:177] Pausing sending
status updates
I0818 23:26:04.034765 9129 status_update_manager.cpp:177] Pausing sending
status updates
I0818 23:26:04.035097 9113 sched.cpp:232] Version: 1.4.0
I0818 23:26:04.035392 9136 sched.cpp:336] New master detected at
[email protected]:45305
I0818 23:26:04.035468 9136 sched.cpp:407] Authenticating with master
[email protected]:45305
I0818 23:26:04.035487 9136 sched.cpp:414] Using default CRAM-MD5 authenticatee
I0818 23:26:04.035712 9126 authenticatee.cpp:97] Initializing client SASL
*** Error in `/usr/bin/python': double free or corruption (fasttop):
0x00002b225c003c50 ***
Enabling authentication for the framework
../../src/tests/script.cpp:83: Failure
Failed
python_framework_test.sh terminated with signal Aborted
[ FAILED ] ExamplesTest.PythonFramework (6643 ms)
[----------] 12 tests from ExamplesTest (85729 ms total)
{code}
> ExamplesTest.PythonFramework is flaky.
> --------------------------------------
>
> Key: MESOS-7218
> URL: https://issues.apache.org/jira/browse/MESOS-7218
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.1.1
> Environment: Ubuntu (12.04, 14.04, 16.04), MacOS Sierra
> Reporter: Till Toenshoff
> Labels: flaky, mesosphere, test
>
> The test appears to be highly flaky (failure rate > 50%) on all tested Ubuntu
> distros. Right now it is unclear to me if this is an issue of the CI I am
> using or an actual test problem or even a proper bug when using Mesos on that
> distribution.
> Also, it fails on Mac OS Sierra with the following stack trace:
> {code}
> [ RUN ] ExamplesTest.PythonFramework
> Using temporary directory
> '/var/folders/v4/0jg38yfd44nb7mzz21c663_40000gn/T/ExamplesTest_PythonFramework_Ic9qX4'
> ../../src/tests/script.cpp:81: Failure
> Failed
> python_framework_test.sh terminated with signal Segmentation fault: 11
> *** Aborted at 1491951380 (unix time) try "date -d @1491951380" if you are
> using GNU date ***
> PC: @ 0x10e9f1170 testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 710 (TID 0x7fffd5caa3c0) stack trace: ***
> @ 0x7fffccf4ab3a _sigtramp
> @ 0x8 (unknown)
> @ 0x10e9f0977 testing::internal::AssertHelper::operator=()
> @ 0x10e3c8bfd mesos::internal::tests::execute()
> @ 0x10d3e9241 ExamplesTest_PythonFramework_Test::TestBody()
> @ 0x10ea524aa
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0x10ea02df7
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0x10ea02ca5 testing::Test::Run()
> @ 0x10ea04898 testing::TestInfo::Run()
> @ 0x10ea05de7 testing::TestCase::Run()
> @ 0x10ea15c5c testing::internal::UnitTestImpl::RunAllTests()
> @ 0x10ea548aa
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @ 0x10ea15687
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @ 0x10ea15558 testing::UnitTest::Run()
> @ 0x10d9b8971 RUN_ALL_TESTS()
> @ 0x10d9b44dd main
> @ 0x7fffccd3b235 start
> Segmentation fault: 11
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)