[ 
https://issues.apache.org/jira/browse/MESOS-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763536#comment-13763536
 ] 

Vinson Lee commented on MESOS-685:
----------------------------------

[~vinodkone] This is 100% reproducible for me on CentOS, Fedora, and Ubuntu.

{noformat}
$ make MESOS_VERBOSE=1 check
[...]
[ RUN      ] SlaveRecoveryTest/0.RecoveryTimeout
I0910 14:27:54.909389  3894 master.cpp:262] Master started on 127.0.1.1:54558
I0910 14:27:54.909744  3894 master.cpp:277] Master ID: 
201309101427-16842879-54558-3878
I0910 14:27:54.910044  3894 master.cpp:642] Elected as master!
I0910 14:27:54.910264  3894 master.cpp:692] Registering framework 
201309101427-16842879-54558-3878-0000 at scheduler(19)@127.0.1.1:54558
I0910 14:27:54.910408  3894 hierarchical_allocator_process.hpp:321] Added 
framework 201309101427-16842879-54558-3878-0000
I0910 14:27:54.909520  3895 slave.cpp:108] Slave started on 23)@127.0.1.1:54558
I0910 14:27:54.910548  3895 slave.cpp:208] Slave resources: cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0910 14:27:54.910936  3895 slave.cpp:544] New master detected at 
[email protected]:54558
I0910 14:27:54.911054  3895 slave.cpp:559] Postponing registration until 
recovery is complete
I0910 14:27:54.911126  3895 process_isolator.cpp:314] Recovering isolator
I0910 14:27:54.911223  3895 status_update_manager.cpp:157] New master detected 
at [email protected]:54558
W0910 14:27:54.910121  3893 master.cpp:80] No whitelist given. Advertising 
offers for all slaves
I0910 14:27:54.911361  3894 slave.cpp:399] Finished recovery
I0910 14:27:54.911582  3894 master.cpp:1065] Attempting to register slave on 
slave-raring at slave(23)@127.0.1.1:54558
I0910 14:27:54.911640  3894 master.cpp:2135] Adding slave 
201309101427-16842879-54558-3878-0 at slave-raring with cpus(*):2; mem(*):1024; 
disk(*):1024; ports(*):[31000-32000]
I0910 14:27:54.911752  3894 slave.cpp:604] Registered with master 
[email protected]:54558; given slave ID 201309101427-16842879-54558-3878-0
I0910 14:27:54.911993  3894 paths.hpp:369] Created slave directory 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0'
I0910 14:27:54.911829  3895 hierarchical_allocator_process.hpp:434] Added slave 
201309101427-16842879-54558-3878-0 (slave-raring) with cpus(*):2; mem(*):1024; 
disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; 
ports(*):[31000-32000] available)
I0910 14:27:54.912286  3895 master.cpp:1445] Sending 1 offers to framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:54.913525  3895 master.cpp:1682] Processing reply for offer 
201309101427-16842879-54558-3878-0 on slave 201309101427-16842879-54558-3878-0 
(slave-raring) for framework 201309101427-16842879-54558-3878-0000
I0910 14:27:54.913746  3895 master.hpp:318] Adding task 
94446522-a4aa-479c-996e-e198c15e3336 with resources cpus(*):2; mem(*):1024; 
disk(*):1024; ports(*):[31000-32000] on slave 
201309101427-16842879-54558-3878-0 (slave-raring)
I0910 14:27:54.913821  3895 master.cpp:1802] Launching task 
94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000 with resources cpus(*):2; mem(*):1024; 
disk(*):1024; ports(*):[31000-32000] on slave 
201309101427-16842879-54558-3878-0 (slave-raring)
I0910 14:27:54.913645  3894 slave.cpp:617] Checkpointing SlaveInfo to 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/slave.info'
I0910 14:27:54.914192  3894 slave.cpp:773] Got assigned task 
94446522-a4aa-479c-996e-e198c15e3336 for framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:54.914332  3894 slave.cpp:2821] Checkpointing FrameworkInfo to 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/framework.info'
I0910 14:27:54.914579  3894 slave.cpp:2828] Checkpointing framework pid 
'scheduler(19)@127.0.1.1:54558' to 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/framework.pid'
I0910 14:27:54.914794  3894 slave.cpp:884] Launching task 
94446522-a4aa-479c-996e-e198c15e3336 for framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:54.915848  3894 paths.hpp:336] Created executor directory 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
I0910 14:27:54.915993  3894 slave.cpp:3068] Checkpointing ExecutorInfo to 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/executor.info'
I0910 14:27:54.916311  3894 paths.hpp:336] Created executor directory 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
I0910 14:27:54.916664  3895 process_isolator.cpp:100] Launching 
94446522-a4aa-479c-996e-e198c15e3336 (mesos/src/mesos-executor) in 
/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de
 with resources ' for framework 201309101427-16842879-54558-3878-0000
I0910 14:27:54.917760  3894 slave.cpp:3156] Checkpointing TaskInfo to 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de/tasks/94446522-a4aa-479c-996e-e198c15e3336/task.info'
I0910 14:27:54.918025  3894 slave.cpp:995] Queuing task 
'94446522-a4aa-479c-996e-e198c15e3336' for executor 
94446522-a4aa-479c-996e-e198c15e3336 of framework 
'201309101427-16842879-54558-3878-0000
I0910 14:27:54.918128  3894 slave.cpp:526] Successfully attached file 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
I0910 14:27:54.921766  3895 process_isolator.cpp:163] Forked executor at 4184
Checkpointing executor's forked pid 4184 to 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de/pids/forked.pid'
Fetching resources into 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
Command terminated with signal Killed (pid: 4106)
I0910 14:27:54.984256  3895 slave.cpp:1441] Got registration for executor 
'94446522-a4aa-479c-996e-e198c15e3336' of framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:54.984431  3895 slave.cpp:1526] Checkpointing executor pid 
'executor(1)@127.0.1.1:54291' to 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de/pids/libprocess.pid'
I0910 14:27:54.984848  3895 slave.cpp:1562] Flushing queued task 
94446522-a4aa-479c-996e-e198c15e3336 for executor 
'94446522-a4aa-479c-996e-e198c15e3336' of framework 
201309101427-16842879-54558-3878-0000
Registered executor on slave-raring
Starting task 94446522-a4aa-479c-996e-e198c15e3336
sh -c 'sleep 1000'
Forked command at 4221
I0910 14:27:54.989303  3892 slave.cpp:1772] Handling status update TASK_RUNNING 
(UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task 
94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000 from executor(1)@127.0.1.1:54291
I0910 14:27:54.989472  3892 status_update_manager.cpp:300] Received status 
update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task 
94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:54.989650  3892 status_update_manager.hpp:337] Checkpointing UPDATE 
for status update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for 
task 94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:55.061668  3892 status_update_manager.cpp:351] Forwarding status 
update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task 
94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000 to [email protected]:54558
I0910 14:27:55.062222  3892 master.cpp:1211] Status update TASK_RUNNING (UUID: 
d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task 
94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000 from slave(23)@127.0.1.1:54558
I0910 14:27:55.062423  3892 slave.cpp:1897] Sending acknowledgement for status 
update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task 
94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000 to executor(1)@127.0.1.1:54291
I0910 14:27:55.063185  3892 status_update_manager.cpp:375] Received status 
update acknowledgement (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for task 
94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:55.063282  3892 status_update_manager.hpp:337] Checkpointing ACK 
for status update TASK_RUNNING (UUID: d0261e6f-cbf0-4047-95d8-58981b6f7e08) for 
task 94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:55.128672  3878 slave.cpp:454] Slave terminating
I0910 14:27:55.129102  3893 master.cpp:550] Slave 
201309101427-16842879-54558-3878-0 (slave-raring) disconnected
I0910 14:27:55.129261  3893 hierarchical_allocator_process.hpp:459] Removed 
slave 201309101427-16842879-54558-3878-0
I0910 14:27:55.130692  3893 slave.cpp:108] Slave started on 24)@127.0.1.1:54558
I0910 14:27:55.130841  3893 slave.cpp:208] Slave resources: cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0910 14:27:55.131306  3893 state.cpp:33] Recovering state from 
/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta
I0910 14:27:55.132431  3893 slave.cpp:2760] Recovering framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:55.132691  3893 slave.cpp:2936] Recovering executor 
'94446522-a4aa-479c-996e-e198c15e3336' of framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:55.133411  3893 slave.cpp:544] New master detected at 
[email protected]:54558
I0910 14:27:55.133577  3895 status_update_manager.cpp:179] Recovering status 
update manager
I0910 14:27:55.133677  3895 status_update_manager.cpp:183] Recovering executor 
'94446522-a4aa-479c-996e-e198c15e3336' of framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:55.134037  3895 process_isolator.cpp:314] Recovering isolator
I0910 14:27:55.134218  3895 process_isolator.cpp:322] Recovering executor 
'94446522-a4aa-479c-996e-e198c15e3336' of framework 
201309101427-16842879-54558-3878-0000
I0910 14:27:55.134697  3893 slave.cpp:559] Postponing registration until 
recovery is complete
I0910 14:27:55.134790  3893 slave.cpp:526] Successfully attached file 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/executors/94446522-a4aa-479c-996e-e198c15e3336/runs/48981d17-6dfa-429a-b555-3cfdf423a8de'
I0910 14:27:55.134748  3895 status_update_manager.cpp:157] New master detected 
at [email protected]:54558
I0910 14:27:55.134944  3893 slave.cpp:2710] Sending reconnect request to 
executor 94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000 at executor(1)@127.0.1.1:54291
I0910 14:27:55.138732  3893 slave.cpp:1606] Re-registering executor 
94446522-a4aa-479c-996e-e198c15e3336 of framework 
201309101427-16842879-54558-3878-0000
Re-registered executor on slave-raring
I0910 14:27:59.131731  3894 slave.cpp:1720] Cleaning up un-reregistered 
executors
I0910 14:27:59.131906  3894 slave.cpp:399] Finished recovery
W0910 14:27:59.132184  3894 master.cpp:1120] Slave at slave(24)@127.0.1.1:54558 
(slave-raring) is being allowed to re-register with an already in use id 
(201309101427-16842879-54558-3878-0)
I0910 14:27:59.132514  3894 hierarchical_allocator_process.hpp:434] Added slave 
201309101427-16842879-54558-3878-0 (slave-raring) with cpus(*):2; mem(*):1024; 
disk(*):1024; ports(*):[31000-32000] (and  available)
I0910 14:27:59.132619  3894 slave.cpp:645] Re-registered with master 
[email protected]:54558
I0910 14:27:59.132674  3894 slave.cpp:1333] Updating framework 
201309101427-16842879-54558-3878-0000 pid to scheduler(19)@127.0.1.1:54558
I0910 14:27:59.132788  3894 slave.cpp:1341] Checkpointing framework pid 
'scheduler(19)@127.0.1.1:54558' to 
'/tmp/SlaveRecoveryTest_0_RecoveryTimeout_4VHyyY/meta/slaves/201309101427-16842879-54558-3878-0/frameworks/201309101427-16842879-54558-3878-0000/framework.pid'
W0910 14:27:59.912508  3894 master.cpp:80] No whitelist given. Advertising 
offers for all slaves
W0910 14:28:04.913756  3895 master.cpp:80] No whitelist given. Advertising 
offers for all slaves
tests/slave_recovery_tests.cpp:776: Failure
Failed to wait 10secs for status
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000000000, pid=3878, tid=46912628471552
#
# JRE version: 7.0_25-b30
# Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 compressed 
oops)
# Problematic frame:
# C  0x0000000000000000
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# mesos/src/hs_err_pid3878.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   https://bugs.launchpad.net/ubuntu/+source/openjdk-7/
#
Shutting down
Killing process tree at pid 4221
make[3]: *** [check-local] Aborted
make[3]: Leaving directory `mesos/src'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `mesos/src'
make[1]: *** [check] Error 2
make[1]: Leaving directory `mesos/src'
make: *** [check-recursive] Error 1
Killed the following process trees:
[ 
-+- 4221 sh -c sleep 1000 
 \--- 4222 sleep 1000 
]
Command terminated with signal Killed (pid: 4221)
{noformat}
                
> SlaveRecoveryTest/0.RecoveryTimeout Java SIGSEGV
> ------------------------------------------------
>
>                 Key: MESOS-685
>                 URL: https://issues.apache.org/jira/browse/MESOS-685
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.15.0
>         Environment: Linux
>            Reporter: Vinson Lee
>         Attachments: hs_err_pid2204.log
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to