Zuyu Zhang created MESOS-1606:
---------------------------------

             Summary: Slave failed to checkpoint on Mac OS X
                 Key: MESOS-1606
                 URL: https://issues.apache.org/jira/browse/MESOS-1606
             Project: Mesos
          Issue Type: Bug
          Components: slave
         Environment: Mac OS X, Darwin Kernel Version 13.3.0
            Reporter: Zuyu Zhang


{noformat}
This bug happens to test_framework and LowLevelSchedulerLibprocess as well.

[ RUN      ] ExamplesTest.LowLevelSchedulerPthread
Using temporary directory '/tmp/ExamplesTest_LowLevelSchedulerPthread_SCL6Al'
Enabling authentication for the scheduler
I0715 19:03:59.296200 2019271440 scheduler.cpp:132] Version: 0.20.0
I0715 19:03:59.300429 2019271440 leveldb.cpp:176] Opened db in 1982us
I0715 19:03:59.300900 2019271440 leveldb.cpp:183] Compacted db in 447us
I0715 19:03:59.300946 2019271440 leveldb.cpp:198] Created db iterator in 27us
I0715 19:03:59.300978 2019271440 leveldb.cpp:204] Seeked to beginning of db in 
16us
I0715 19:03:59.301007 2019271440 leveldb.cpp:273] Iterated through 0 keys in 
the db in 20us
I0715 19:03:59.301053 2019271440 replica.cpp:741] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0715 19:03:59.301713 222965760 recover.cpp:425] Starting replica recovery
I0715 19:03:59.301914 222965760 recover.cpp:451] Replica is in EMPTY status
I0715 19:03:59.302671 221892608 replica.cpp:638] Replica in EMPTY status 
received a broadcasted recover request
I0715 19:03:59.302781 224575488 recover.cpp:188] Received a recover response 
from a replica in EMPTY status
I0715 19:03:59.303050 225112064 recover.cpp:542] Updating replica status to 
STARTING
I0715 19:03:59.303432 222965760 leveldb.cpp:306] Persisting metadata (8 bytes) 
to leveldb took 298us
I0715 19:03:59.303475 222965760 replica.cpp:320] Persisted replica status to 
STARTING
I0715 19:03:59.303540 221356032 recover.cpp:451] Replica is in STARTING status
I0715 19:03:59.303797 224575488 master.cpp:288] Master 
20140715-190359-16777343-64313-60122 (localhost) started on 127.0.0.1:64313
I0715 19:03:59.303848 224575488 master.cpp:325] Master only allowing 
authenticated frameworks to register
I0715 19:03:59.303865 224575488 master.cpp:332] Master allowing unauthenticated 
slaves to register
I0715 19:03:59.303884 224575488 credentials.hpp:36] Loading credentials for 
authentication from 
'/tmp/ExamplesTest_LowLevelSchedulerPthread_SCL6Al/credentials'
W0715 19:03:59.303961 224575488 credentials.hpp:51] Permissions on credentials 
file '/tmp/ExamplesTest_LowLevelSchedulerPthread_SCL6Al/credentials' are too 
open. It is recommended that your credentials file is NOT accessible by others.
I0715 19:03:59.304028 224575488 master.cpp:359] Authorization enabled
I0715 19:03:59.304379 223502336 replica.cpp:638] Replica in STARTING status 
received a broadcasted recover request
I0715 19:03:59.304505 2019271440 containerizer.cpp:124] Using isolation: 
posix/cpu,posix/mem
I0715 19:03:59.304666 223502336 recover.cpp:188] Received a recover response 
from a replica in STARTING status
I0715 19:03:59.304805 223502336 recover.cpp:542] Updating replica status to 
VOTING
I0715 19:03:59.305186 223502336 leveldb.cpp:306] Persisting metadata (8 bytes) 
to leveldb took 214us
I0715 19:03:59.305219 223502336 replica.cpp:320] Persisted replica status to 
VOTING
I0715 19:03:59.305250 223502336 recover.cpp:556] Successfully joined the Paxos 
group
I0715 19:03:59.305361 223502336 recover.cpp:440] Recover process terminated
I0715 19:03:59.305927 224038912 slave.cpp:168] Slave started on 
1)@127.0.0.1:64313
I0715 19:03:59.306221 224038912 slave.cpp:279] Slave resources: cpus(*):4; 
mem(*):7168; disk(*):470714; ports(*):[31000-32000]
I0715 19:03:59.306234 2019271440 containerizer.cpp:124] Using isolation: 
posix/cpu,posix/mem
I0715 19:03:59.306248 223502336 master.cpp:1128] The newly elected leader is 
[email protected]:64313 with id 20140715-190359-16777343-64313-60122
I0715 19:03:59.306269 223502336 master.cpp:1141] Elected as the leading master!
I0715 19:03:59.306293 223502336 master.cpp:959] Recovering from registrar
I0715 19:03:59.306395 225112064 registrar.cpp:313] Recovering registrar
I0715 19:03:59.306617 221892608 log.cpp:656] Attempting to start the writer
I0715 19:03:59.306952 224575488 slave.cpp:168] Slave started on 
2)@127.0.0.1:64313
I0715 19:03:59.307158 224575488 slave.cpp:279] Slave resources: cpus(*):4; 
mem(*):7168; disk(*):470714; ports(*):[31000-32000]
I0715 19:03:59.307207 222965760 replica.cpp:474] Replica received implicit 
promise request with proposal 1
I0715 19:03:59.307401 224038912 slave.cpp:324] Slave hostname: localhost
I0715 19:03:59.307459 224038912 slave.cpp:325] Slave checkpoint: true
I0715 19:03:59.307446 222965760 leveldb.cpp:306] Persisting metadata (8 bytes) 
to leveldb took 232us
I0715 19:03:59.307512 222965760 replica.cpp:342] Persisted promised to 1
I0715 19:03:59.307615 224575488 slave.cpp:324] Slave hostname: localhost
I0715 19:03:59.307631 224575488 slave.cpp:325] Slave checkpoint: true
I0715 19:03:59.307802 222965760 coordinator.cpp:230] Coordinator attemping to 
fill missing position
I0715 19:03:59.307924 223502336 state.cpp:33] Recovering state from 
'/var/folders/67/g567hfcj4bjcd_bm3gsqs54h0000gn/T/mesos-XXXXXX.FUk9AYoy/0/meta'
I0715 19:03:59.308027 2019271440 containerizer.cpp:124] Using isolation: 
posix/cpu,posix/mem
I0715 19:03:59.308171 222429184 status_update_manager.cpp:193] Recovering 
status update manager
I0715 19:03:59.308205 225112064 state.cpp:33] Recovering state from 
'/var/folders/67/g567hfcj4bjcd_bm3gsqs54h0000gn/T/mesos-XXXXXX.FUk9AYoy/1/meta'
I0715 19:03:59.308316 221892608 containerizer.cpp:287] Recovering containerizer
I0715 19:03:59.308384 221356032 status_update_manager.cpp:193] Recovering 
status update manager
I0715 19:03:59.308575 225112064 containerizer.cpp:287] Recovering containerizer
I0715 19:03:59.309072 222429184 slave.cpp:3130] Finished recovery
I0715 19:03:59.309079 223502336 slave.cpp:3130] Finished recovery
F0715 19:03:59.309267 222429184 slave.cpp:3141] 
CHECK_SOME(state::checkpoint(path, bootId.get())): Failed to checkpoint 
'1405473915' to 
'/var/folders/67/g567hfcj4bjcd_bm3gsqs54h0000gn/T/mesos-XXXXXX.FUk9AYoy/0/meta/boot_id':
 Failed to open file 
'/var/folders/67/g567hfcj4bjcd_bm3gsqs54h0000gn/T/mesos-XXXXXX.FUk9AYoy/0/meta/boot_id':
 No such file or directory
*** Check failure stack trace: ***
I0715 19:03:59.309270 221892608 replica.cpp:375] Replica received explicit 
promise request for position 0 with proposal 2
I0715 19:03:59.309516 221892608 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 219us
I0715 19:03:59.309502 223502336 slave.cpp:168] Slave started on 
3)@127.0.0.1:64313
I0715 19:03:59.309582 222965760 slave.cpp:603] New master detected at 
[email protected]:64313
I0715 19:03:59.309588 221892608 replica.cpp:676] Persisted action at 0
I0715 19:03:59.309665 222965760 slave.cpp:639] No credentials provided. 
Attempting to register without authentication
I0715 19:03:59.309685 225112064 status_update_manager.cpp:167] New master 
detected at [email protected]:64313
I0715 19:03:59.309798 223502336 slave.cpp:279] Slave resources: cpus(*):4; 
mem(*):7168; disk(*):470714; ports(*):[31000-32000]
I0715 19:03:59.310104 224038912 replica.cpp:508] Replica received write request 
for position 0
I0715 19:03:59.310331 222965760 slave.cpp:652] Detecting new master
I0715 19:03:59.310395 224038912 leveldb.cpp:438] Reading position from leveldb 
took 30us
I0715 19:03:59.310642 223502336 slave.cpp:324] Slave hostname: localhost
I0715 19:03:59.310657 223502336 slave.cpp:325] Slave checkpoint: true
I0715 19:03:59.310689 224038912 leveldb.cpp:343] Persisting action (14 bytes) 
to leveldb took 227us
I0715 19:03:59.310722 224038912 replica.cpp:676] Persisted action at 0
I0715 19:03:59.310936 222965760 replica.cpp:655] Replica received learned 
notice for position 0
I0715 19:03:59.311103 222965760 leveldb.cpp:343] Persisting action (16 bytes) 
to leveldb took 160us
    @        0x10b3d54f9  google::LogMessage::SendToLog()
I0715 19:03:59.311158 221892608 state.cpp:33] Recovering state from 
'/var/folders/67/g567hfcj4bjcd_bm3gsqs54h0000gn/T/mesos-XXXXXX.FUk9AYoy/2/meta'
I0715 19:03:59.311436 222965760 replica.cpp:676] Persisted action at 0
I0715 19:03:59.311514 222965760 replica.cpp:661] Replica learned NOP action at 
position 0
I0715 19:03:59.311544 221892608 status_update_manager.cpp:193] Recovering 
status update manager
I0715 19:03:59.311612 221892608 containerizer.cpp:287] Recovering containerizer
I0715 19:03:59.311643 222965760 log.cpp:672] Writer started with ending 
position 0
    @        0x10b3d5a24  google::LogMessage::Flush()
I0715 19:03:59.311983 225112064 slave.cpp:3130] Finished recovery
    @        0x10b3d8b0f  google::LogMessageFatal::~LogMessageFatal()
I0715 19:03:59.312419 224038912 leveldb.cpp:438] Reading position from leveldb 
took 43us
I0715 19:03:59.312515 222965760 slave.cpp:603] New master detected at 
[email protected]:64313
I0715 19:03:59.312854 222965760 slave.cpp:639] No credentials provided. 
Attempting to register without authentication
I0715 19:03:59.312891 222965760 slave.cpp:652] Detecting new master
I0715 19:03:59.312924 222965760 status_update_manager.cpp:167] New master 
detected at [email protected]:64313
    @        0x10b3d60f9  google::LogMessageFatal::~LogMessageFatal()
    @        0x10ad381b3  _CheckFatal::~_CheckFatal()
    @        0x10ad37a29  _CheckFatal::~_CheckFatal()
    @        0x10af8371f  mesos::internal::slave::Slave::__recover()
    @        0x10b30df43  process::ProcessBase::visit()
    @        0x10b304d44  process::ProcessManager::resume()
    @        0x10b30488f  process::schedule()
    @     0x7fff907b0899  _pthread_body
    @     0x7fff907b072a  _pthread_start
    @     0x7fff907b4fc9  thread_start
../../src/tests/script.cpp:85: Failure
Failed
low_level_scheduler_pthread_test.sh terminated with signal Abort trap: 6
make[3]: *** [check-local] Segmentation fault: 11
make[2]: *** [check-am] Error 2
make[1]: *** [check] Error 2
make: *** [check-recursive] Error 1
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to