[
https://issues.apache.org/jira/browse/MESOS-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763536#comment-13763536
]
Vinson Lee commented on MESOS-685:
----------------------------------
[~vinodkone] This is 100% reproducible for me on CentOS, Fedora, and Ubuntu.
{noformat}
$ make MESOS_VERBOSE=1 check
[...]
[ RUN ] SlaveRecoveryTest/0.RecoveryTimeout
I0910 14:27:54.909389 3894 master.cpp:262] Master started on 127.0.1.1:54558
I0910 14:27:54.909744 3894 master.cpp:277] Master ID:
201309101427-16842879-54558-3878
I0910 14:27:54.910044 3894 master.cpp:642] Elected as master!
I0910 14:27:54.910264 3894 master.cpp:692] Registering framework
201309101427-16842879-54558-3878-0000 at scheduler(19)@127.0.1.1:54558
I0910 14:27:54.910408 3894 hierarchical_allocator_process.hpp:321] Added
framework 201309101427-16842879-54558-3878-0000
I0910 14:27:54.909520 3895 slave.cpp:108] Slave started on 23)@127.0.1.1:54558
I0910 14:27:54.910548 3895 slave.cpp:208] Slave resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0910 14:27:54.910936 3895 slave.cpp:544] New master detected at
[email protected]:54558
I0910 14:27:54.911054 3895 slave.cpp:559] Postponing registration until
recovery is complete
I0910 14:27:54.911126 3895 process_isolator.cpp:314] Recovering isolator
I0910 14:27:54.911223 3895 status_update_manager.cpp:157] New master detected
at [email protected]:54558
W0910 14:27:54.910121 3893 master.cpp:80] No whitelist given. Advertising
offers for all slaves
I0910 14:27:54.911361 3894 slave.cpp:399] Finished recovery
I0910 14:27:54.911582 3894 master.cpp:1065] Attempting to register slave on
slave-raring at slave(23)@127.0.1.1:54558
I0910 14:27:54.911640 3894 master.cpp:2135] Adding slave
201309101427-16842879-54558-3878-0 at slave-raring with cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000]
I0910 14:27:54.911752 3894 slave.cpp:604] Registered with master
[email protected]:54558; given slave ID 201309101427-16842879-54558-3878-0
I0910 14:27:54.911993 3894 paths.hpp:369] Created slave directory
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0'
I0910 14:27:54.911829 3895 hierarchical_allocator_process.hpp:434] Added slave
201309101427-16842879-54558-3878-0 (slave-raring) with cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000] available)
I0910 14:27:54.912286 3895 master.cpp:1445] Sending 1 offers to framework
201309101427-16842879-54558-3878-0000
I0910 14:27:54.913525 3895 master.cpp:1682] Processing reply for offer
201309101427-16842879-54558-3878-0 on slave 201309101427-16842879-54558-3878-0
(slave-raring) for framework 201309101427-16842879-54558-3878-0000
I0910 14:27:54.913746 3895 master.hpp:318] Adding task
94446522-a4aa-479c-996e-e198c15e3336 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
201309101427-16842879-54558-3878-0 (slave-raring)
I0910 14:27:54.913821 3895 master.cpp:1802] Launching task
94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000 with resources cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] on slave
201309101427-16842879-54558-3878-0 (slave-raring)
I0910 14:27:54.913645 3894 slave.cpp:617] Checkpointing SlaveInfo to
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/slave.info'
I0910 14:27:54.914192 3894 slave.cpp:773] Got assigned task
94446522-a4aa-479c-996e-e198c15e3336 for framework
201309101427-16842879-54558-3878-0000
I0910 14:27:54.914332 3894 slave.cpp:2821] Checkpointing FrameworkInfo to
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/framework.info'
I0910 14:27:54.914579 3894 slave.cpp:2828] Checkpointing framework pid
'scheduler(19)@127.0.1.1:54558' to
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/framework.pid'
I0910 14:27:54.914794 3894 slave.cpp:884] Launching task
94446522-a4aa-479c-996e-e198c15e3336 for framework
201309101427-16842879-54558-3878-0000
I0910 14:27:54.915848 3894 paths.hpp:336] Created executor directory
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
I0910 14:27:54.915993 3894 slave.cpp:3068] Checkpointing ExecutorInfo to
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/executor.info'
I0910 14:27:54.916311 3894 paths.hpp:336] Created executor directory
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
I0910 14:27:54.916664 3895 process_isolator.cpp:100] Launching
94446522-a4aa-479c-996e-e198c15e3336 (mesos/src/mesos-executor) in
/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de
with resources ' for framework 201309101427-16842879-54558-3878-0000
I0910 14:27:54.917760 3894 slave.cpp:3156] Checkpointing TaskInfo to
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de/tasks/94446522-a4aa-479c-996e-e198c15e3336/task.info'
I0910 14:27:54.918025 3894 slave.cpp:995] Queuing task
'94446522-a4aa-479c-996e-e198c15e3336' for executor
94446522-a4aa-479c-996e-e198c15e3336 of framework
'201309101427-16842879-54558-3878-0000
I0910 14:27:54.918128 3894 slave.cpp:526] Successfully attached file
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
I0910 14:27:54.921766 3895 process_isolator.cpp:163] Forked executor at 4184
Checkpointing executor's forked pid 4184 to
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de/pids/forked.pid'
Fetching resources into
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
Command terminated with signal Killed (pid: 4106)
I0910 14:27:54.984256 3895 slave.cpp:1441] Got registration for executor
'94446522-a4aa-479c-996e-e198c15e3336' of framework
201309101427-16842879-54558-3878-0000
I0910 14:27:54.984431 3895 slave.cpp:1526] Checkpointing executor pid
'executor(1)@127.0.1.1:54291' to
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de/pids/libprocess.pid'
I0910 14:27:54.984848 3895 slave.cpp:1562] Flushing queued task
94446522-a4aa-479c-996e-e198c15e3336 for executor
'94446522-a4aa-479c-996e-e198c15e3336' of framework
201309101427-16842879-54558-3878-0000
Registered executor on slave-raring
Starting task 94446522-a4aa-479c-996e-e198c15e3336
sh -c 'sleep 1000'
Forked command at 4221
I0910 14:27:54.989303 3892 slave.cpp:1772] Handling status update TASK_RUNNING
(UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task
94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000 from executor(1)@127.0.1.1:54291
I0910 14:27:54.989472 3892 status_update_manager.cpp:300] Received status
update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task
94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000
I0910 14:27:54.989650 3892 status_update_manager.hpp:337] Checkpointing UPDATE
for status update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for
task 94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000
I0910 14:27:55.061668 3892 status_update_manager.cpp:351] Forwarding status
update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task
94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000 to [email protected]:54558
I0910 14:27:55.062222 3892 master.cpp:1211] Status update TASK_RUNNING (UUID:
d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task
94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000 from slave(23)@127.0.1.1:54558
I0910 14:27:55.062423 3892 slave.cpp:1897] Sending acknowledgement for status
update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task
94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000 to executor(1)@127.0.1.1:54291
I0910 14:27:55.063185 3892 status_update_manager.cpp:375] Received status
update acknowledgement (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task
94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000
I0910 14:27:55.063282 3892 status_update_manager.hpp:337] Checkpointing ACK
for status update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for
task 94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000
I0910 14:27:55.128672 3878 slave.cpp:454] Slave terminating
I0910 14:27:55.129102 3893 master.cpp:550] Slave
201309101427-16842879-54558-3878-0 (slave-raring) disconnected
I0910 14:27:55.129261 3893 hierarchical_allocator_process.hpp:459] Removed
slave 201309101427-16842879-54558-3878-0
I0910 14:27:55.130692 3893 slave.cpp:108] Slave started on 24)@127.0.1.1:54558
I0910 14:27:55.130841 3893 slave.cpp:208] Slave resources: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0910 14:27:55.131306 3893 state.cpp:33] Recovering state from
/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta
I0910 14:27:55.132431 3893 slave.cpp:2760] Recovering framework
201309101427-16842879-54558-3878-0000
I0910 14:27:55.132691 3893 slave.cpp:2936] Recovering executor
'94446522-a4aa-479c-996e-e198c15e3336' of framework
201309101427-16842879-54558-3878-0000
I0910 14:27:55.133411 3893 slave.cpp:544] New master detected at
[email protected]:54558
I0910 14:27:55.133577 3895 status_update_manager.cpp:179] Recovering status
update manager
I0910 14:27:55.133677 3895 status_update_manager.cpp:183] Recovering executor
'94446522-a4aa-479c-996e-e198c15e3336' of framework
201309101427-16842879-54558-3878-0000
I0910 14:27:55.134037 3895 process_isolator.cpp:314] Recovering isolator
I0910 14:27:55.134218 3895 process_isolator.cpp:322] Recovering executor
'94446522-a4aa-479c-996e-e198c15e3336' of framework
201309101427-16842879-54558-3878-0000
I0910 14:27:55.134697 3893 slave.cpp:559] Postponing registration until
recovery is complete
I0910 14:27:55.134790 3893 slave.cpp:526] Successfully attached file
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
I0910 14:27:55.134748 3895 status_update_manager.cpp:157] New master detected
at [email protected]:54558
I0910 14:27:55.134944 3893 slave.cpp:2710] Sending reconnect request to
executor 94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000 at executor(1)@127.0.1.1:54291
I0910 14:27:55.138732 3893 slave.cpp:1606] Re-registering executor
94446522-a4aa-479c-996e-e198c15e3336 of framework
201309101427-16842879-54558-3878-0000
Re-registered executor on slave-raring
I0910 14:27:59.131731 3894 slave.cpp:1720] Cleaning up un-reregistered
executors
I0910 14:27:59.131906 3894 slave.cpp:399] Finished recovery
W0910 14:27:59.132184 3894 master.cpp:1120] Slave at slave(24)@127.0.1.1:54558
(slave-raring) is being allowed to re-register with an already in use id
(201309101427-16842879-54558-3878-0)
I0910 14:27:59.132514 3894 hierarchical_allocator_process.hpp:434] Added slave
201309101427-16842879-54558-3878-0 (slave-raring) with cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000] (and available)
I0910 14:27:59.132619 3894 slave.cpp:645] Re-registered with master
[email protected]:54558
I0910 14:27:59.132674 3894 slave.cpp:1333] Updating framework
201309101427-16842879-54558-3878-0000 pid to scheduler(19)@127.0.1.1:54558
I0910 14:27:59.132788 3894 slave.cpp:1341] Checkpointing framework pid
'scheduler(19)@127.0.1.1:54558' to
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/framework.pid'
W0910 14:27:59.912508 3894 master.cpp:80] No whitelist given. Advertising
offers for all slaves
W0910 14:28:04.913756 3895 master.cpp:80] No whitelist given. Advertising
offers for all slaves
tests/slave_recovery_tests.cpp:776: Failure
Failed to wait 10secs for status
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000000000000000, pid=3878, tid=46912628471552
#
# JRE version: 7.0_25-b30
# Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 compressed
oops)
# Problematic frame:
# C 0x0000000000000000
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# mesos/src/hs_err_pid3878.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
# https://bugs.launchpad.net/ubuntu/+source/openjdk-7/
#
Shutting down
Killing process tree at pid 4221
make[3]: *** [check-local] Aborted
make[3]: Leaving directory `mesos/src'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `mesos/src'
make[1]: *** [check] Error 2
make[1]: Leaving directory `mesos/src'
make: *** [check-recursive] Error 1
Killed the following process trees:
[
-+- 4221 sh -c sleep 1000
\--- 4222 sleep 1000
]
Command terminated with signal Killed (pid: 4221)
{noformat}
> SlaveRecoveryTest/0.RecoveryTimeout Java SIGSEGV
> ------------------------------------------------
>
> Key: MESOS-685
> URL: https://issues.apache.org/jira/browse/MESOS-685
> Project: Mesos
> Issue Type: Bug
> Components: test
> Affects Versions: 0.15.0
> Environment: Linux
> Reporter: Vinson Lee
> Attachments: hs_err_pid2204.log
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira