[ 
https://issues.apache.org/jira/browse/MESOS-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394409#comment-15394409
 ] 

Joseph Wu commented on MESOS-5907:
----------------------------------

Also observed this infinite loop on the ASF CI because the agent took more than 
2 seconds to fail:

{code}
[ RUN      ] ExamplesTest.DiskFullFramework
Using temporary directory '/tmp/ExamplesTest_DiskFullFramework_lBhMQ9'
/mesos/mesos-1.1.0/src/tests/disk_full_framework_test.sh: line 8: 
/mesos/mesos-1.1.0/_build/src/colors.sh: No such file or directory
/mesos/mesos-1.1.0/src/tests/disk_full_framework_test.sh: line 9: 
/mesos/mesos-1.1.0/_build/src/atexit.sh: No such file or directory
Launched master at 451
Launched agent at 469
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0726 14:53:44.130648   451 main.cpp:263] Build: 2016-07-26 14:31:53 by 
I0726 14:53:44.130833   451 main.cpp:264] Version: 1.1.0
I0726 14:53:44.134968   451 process.cpp:1071] libprocess is initialized on 
127.0.0.1:5432 with 16 worker threads
I0726 14:53:44.135056   451 logging.cpp:199] Logging to STDERR
I0726 14:53:44.136104   451 main.cpp:370] Using 'HierarchicalDRF' allocator
I0726 14:53:44.139860   451 leveldb.cpp:174] Opened db in 3.492501ms
I0726 14:53:44.141086   451 leveldb.cpp:181] Compacted db in 1.171278ms
I0726 14:53:44.141196   451 leveldb.cpp:196] Created db iterator in 74126ns
I0726 14:53:44.141222   451 leveldb.cpp:202] Seeked to beginning of db in 4323ns
I0726 14:53:44.141232   451 leveldb.cpp:271] Iterated through 0 keys in the db 
in 356ns
I0726 14:53:44.141439   451 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0726 14:53:44.144086   496 recover.cpp:451] Starting replica recovery
I0726 14:53:44.144966   496 recover.cpp:477] Replica is in EMPTY status
I0726 14:53:44.145054   451 main.cpp:484] Creating default 'local' authorizer
I0726 14:53:44.145643   451 main.cpp:543] Starting Mesos master
I0726 14:53:44.147657   493 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (3)@127.0.0.1:5432
I0726 14:53:44.148839   493 recover.cpp:197] Received a recover response from a 
replica in EMPTY status
I0726 14:53:44.149585   494 recover.cpp:568] Updating replica status to STARTING
I0726 14:53:44.150604   491 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 825875ns
I0726 14:53:44.150635   491 replica.cpp:320] Persisted replica status to 
STARTING
I0726 14:53:44.150743   451 master.cpp:375] Master 
c2149f99-5702-44d7-8e17-040e3e228403 (localhost) started on 127.0.0.1:5432
I0726 14:53:44.151023   490 recover.cpp:477] Replica is in STARTING status
I0726 14:53:44.150774   451 master.cpp:377] Flags at startup: 
--acls="permissive: false
register_frameworks {
  principals {
    type: SOME
    values: "test-principal"
  }
  roles {
    type: SOME
    values: "*"
  }
}
run_tasks {
  principals {
    type: SOME
    values: "test-principal"
  }
  users {
    type: SOME
    values: "mesos"
  }
}
" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="true" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/tmp/ExamplesTest_DiskFullFramework_lBhMQ9/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --initialize_driver_logging="true" 
--ip="127.0.0.1" --log_auto_initialize="true" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--port="5432" --quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="20secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-1.1.0/src/webui" --work_dir="/tmp/mesos-bXDn8B" 
--zk_session_timeout="10secs"
W0726 14:53:44.151664   451 master.cpp:380] 
**************************************************
Master bound to loopback interface! Cannot communicate with remote schedulers 
or agents. You might want to set '--ip' flag to a routable IP address.
**************************************************
I0726 14:53:44.151708   451 master.cpp:427] Master only allowing authenticated 
frameworks to register
I0726 14:53:44.151717   451 master.cpp:443] Master allowing unauthenticated 
agents to register
I0726 14:53:44.151722   451 master.cpp:457] Master allowing HTTP frameworks to 
register without authentication
I0726 14:53:44.151731   451 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/ExamplesTest_DiskFullFramework_lBhMQ9/credentials'
W0726 14:53:44.151836   451 credentials.hpp:52] Permissions on credentials file 
'/tmp/ExamplesTest_DiskFullFramework_lBhMQ9/credentials' are too open; it is 
recommended that your credentials file is NOT accessible by others
I0726 14:53:44.152092   451 master.cpp:499] Using default 'crammd5' 
authenticator
I0726 14:53:44.152335   451 authenticator.cpp:519] Initializing server SASL
I0726 14:53:44.152325   495 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (4)@127.0.0.1:5432
I0726 14:53:44.152746   489 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I0726 14:53:44.153059   451 auxprop.cpp:73] Initialized in-memory auxiliary 
property plugin
I0726 14:53:44.153189   451 master.cpp:579] Authorization enabled
I0726 14:53:44.153281   499 recover.cpp:568] Updating replica status to VOTING
I0726 14:53:44.153734   497 whitelist_watcher.cpp:77] No whitelist given
I0726 14:53:44.153813   501 hierarchical.cpp:151] Initialized hierarchical 
allocator process
I0726 14:53:44.153937   493 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 430286ns
I0726 14:53:44.153977   493 replica.cpp:320] Persisted replica status to VOTING
I0726 14:53:44.154110   503 recover.cpp:582] Successfully joined the Paxos group
I0726 14:53:44.154350   503 recover.cpp:466] Recover process terminated
I0726 14:53:44.171664   492 master.cpp:1847] The newly elected leader is 
[email protected]:5432 with id c2149f99-5702-44d7-8e17-040e3e228403
I0726 14:53:44.171751   492 master.cpp:1860] Elected as the leading master!
I0726 14:53:44.171789   492 master.cpp:1547] Recovering from registrar
I0726 14:53:44.172025   498 registrar.cpp:332] Recovering registrar
I0726 14:53:44.173367   499 log.cpp:553] Attempting to start the writer
I0726 14:53:44.176110   489 replica.cpp:493] Replica received implicit promise 
request from (5)@127.0.0.1:5432 with proposal 1
I0726 14:53:44.176636   489 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 472618ns
I0726 14:53:44.176662   489 replica.cpp:342] Persisted promised to 1
I0726 14:53:44.177786   489 coordinator.cpp:238] Coordinator attempting to fill 
missing positions
I0726 14:53:44.179344   503 replica.cpp:388] Replica received explicit promise 
request from (6)@127.0.0.1:5432 for position 0 with proposal 2
I0726 14:53:44.179770   503 leveldb.cpp:341] Persisting action (8 bytes) to 
leveldb took 326955ns
I0726 14:53:44.179795   503 replica.cpp:712] Persisted action at 0
I0726 14:53:44.181021   496 replica.cpp:537] Replica received write request for 
position 0 from (7)@127.0.0.1:5432
I0726 14:53:44.181149   496 leveldb.cpp:436] Reading position from leveldb took 
89421ns
I0726 14:53:44.181532   496 leveldb.cpp:341] Persisting action (14 bytes) to 
leveldb took 308801ns
I0726 14:53:44.181555   496 replica.cpp:712] Persisted action at 0
I0726 14:53:44.182224   490 replica.cpp:691] Replica received learned notice 
for position 0 from @0.0.0.0:0
I0726 14:53:44.182595   490 leveldb.cpp:341] Persisting action (16 bytes) to 
leveldb took 301273ns
I0726 14:53:44.182616   490 replica.cpp:712] Persisted action at 0
I0726 14:53:44.182636   490 replica.cpp:697] Replica learned NOP action at 
position 0
I0726 14:53:44.183385   504 log.cpp:569] Writer started with ending position 0
I0726 14:53:44.185390   503 leveldb.cpp:436] Reading position from leveldb took 
30503ns
I0726 14:53:44.188508   503 registrar.cpp:365] Successfully fetched the 
registry (0B) in 16.417024ms
I0726 14:53:44.188736   503 registrar.cpp:464] Applied 1 operations in 51155ns; 
attempting to update the 'registry'
I0726 14:53:44.191341   503 log.cpp:577] Attempting to append 156 bytes to the 
log
I0726 14:53:44.191520   489 coordinator.cpp:348] Coordinator attempting to 
write APPEND action at position 1
I0726 14:53:44.192415   491 replica.cpp:537] Replica received write request for 
position 1 from (8)@127.0.0.1:5432
I0726 14:53:44.192991   491 leveldb.cpp:341] Persisting action (175 bytes) to 
leveldb took 498290ns
I0726 14:53:44.193022   491 replica.cpp:712] Persisted action at 1
I0726 14:53:44.194257   498 replica.cpp:691] Replica received learned notice 
for position 1 from @0.0.0.0:0
I0726 14:53:44.194766   498 leveldb.cpp:341] Persisting action (177 bytes) to 
leveldb took 437138ns
I0726 14:53:44.194793   498 replica.cpp:712] Persisted action at 1
I0726 14:53:44.194823   498 replica.cpp:697] Replica learned APPEND action at 
position 1
I0726 14:53:44.196669   495 registrar.cpp:509] Successfully updated the 
'registry' in 7.785984ms
I0726 14:53:44.196846   495 registrar.cpp:395] Successfully recovered registrar
I0726 14:53:44.197120   492 log.cpp:596] Attempting to truncate the log to 1
I0726 14:53:44.197366   500 coordinator.cpp:348] Coordinator attempting to 
write TRUNCATE action at position 2
I0726 14:53:44.197446   504 master.cpp:1655] Recovered 0 agents from the 
Registry (118B) ; allowing 10mins for agents to re-register
I0726 14:53:44.197480   496 hierarchical.cpp:178] Skipping recovery of 
hierarchical allocator: nothing to recover
I0726 14:53:44.198210   504 replica.cpp:537] Replica received write request for 
position 2 from (9)@127.0.0.1:5432
I0726 14:53:44.198786   504 leveldb.cpp:341] Persisting action (16 bytes) to 
leveldb took 529434ns
I0726 14:53:44.198809   504 replica.cpp:712] Persisted action at 2
I0726 14:53:44.199458   498 replica.cpp:691] Replica received learned notice 
for position 2 from @0.0.0.0:0
I0726 14:53:44.199968   498 leveldb.cpp:341] Persisting action (18 bytes) to 
leveldb took 467596ns
I0726 14:53:44.200031   498 leveldb.cpp:399] Deleting ~1 keys from leveldb took 
39189ns
I0726 14:53:44.200052   498 replica.cpp:712] Persisted action at 2
I0726 14:53:44.200084   498 replica.cpp:697] Replica learned TRUNCATE action at 
position 2
I0726 14:53:45.155596   499 hierarchical.cpp:1537] No allocations performed
I0726 14:53:45.155730   499 hierarchical.cpp:1172] Performed allocation for 0 
agents in 353342ns
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0726 14:53:46.147034   469 main.cpp:243] Build: 2016-07-26 14:31:53 by 
I0726 14:53:46.147214   469 main.cpp:244] Version: 1.1.0
I0726 14:53:46.151357   469 process.cpp:1071] libprocess is initialized on 
172.17.0.2:5051 with 16 worker threads
I0726 14:53:46.151448   469 logging.cpp:199] Logging to STDERR
I0726 14:53:46.157204   493 hierarchical.cpp:1537] No allocations performed
I0726 14:53:46.157335   493 hierarchical.cpp:1172] Performed allocation for 0 
agents in 215495ns
I0726 14:53:46.418125   469 systemd.cpp:237] systemd version `219` detected
I0726 14:53:46.418192   469 main.cpp:342] Inializing systemd state
Failed to initialize systemd: Failed to locate systemd runtime directory: 
/run/systemd/system
I0726 14:53:47.158877   502 hierarchical.cpp:1537] No allocations performed
I0726 14:53:47.159018   502 hierarchical.cpp:1172] Performed allocation for 0 
agents in 243814ns
I0726 14:53:48.160492   490 hierarchical.cpp:1537] No allocations performed
I0726 14:53:48.160564   490 hierarchical.cpp:1172] Performed allocation for 0 
agents in 123129ns
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0726 14:53:48.266243   507 process.cpp:1071] libprocess is initialized on 
172.17.0.2:56935 with 16 worker threads
I0726 14:53:48.266592   507 disk_full_framework.cpp:436] Enabling 
authentication for the framework
I0726 14:53:48.268671   507 logging.cpp:199] Logging to STDERR
I0726 14:53:48.270119   507 sched.cpp:226] Version: 1.1.0
I0726 14:53:48.283542   559 sched.cpp:330] New master detected at 
[email protected]:5432
I0726 14:53:48.285239   559 sched.cpp:396] Authenticating with master 
[email protected]:5432
I0726 14:53:48.285272   559 sched.cpp:403] Using default CRAM-MD5 authenticatee
I0726 14:53:48.286123   555 authenticatee.cpp:97] Initializing client SASL
I0726 14:53:48.286865   555 authenticatee.cpp:121] Creating new client SASL 
connection
I0726 14:53:48.288255   494 master.cpp:5883] Authenticating 
[email protected]:56935
I0726 14:53:48.288504   492 authenticator.cpp:414] Starting authentication 
session for crammd5_authenticatee(1)@172.17.0.2:56935
I0726 14:53:48.289562   496 authenticator.cpp:98] Creating new server SASL 
connection
I0726 14:53:48.290738   551 authenticatee.cpp:213] Received SASL authentication 
mechanisms: CRAM-MD5
I0726 14:53:48.290882   551 authenticatee.cpp:239] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0726 14:53:48.291503   493 authenticator.cpp:204] Received SASL authentication 
start
I0726 14:53:48.291597   493 authenticator.cpp:326] Authentication requires more 
steps
I0726 14:53:48.292167   546 authenticatee.cpp:259] Received SASL authentication 
step
I0726 14:53:48.292670   491 authenticator.cpp:232] Received SASL authentication 
step
I0726 14:53:48.292727   491 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: '20ee489ad4ce' server FQDN: '20ee489ad4ce' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false 
I0726 14:53:48.292760   491 auxprop.cpp:181] Looking up auxiliary property 
'*userPassword'
I0726 14:53:48.292896   491 auxprop.cpp:181] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I0726 14:53:48.292948   491 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: '20ee489ad4ce' server FQDN: '20ee489ad4ce' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true 
I0726 14:53:48.292985   491 auxprop.cpp:131] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0726 14:53:48.292995   491 auxprop.cpp:131] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0726 14:53:48.293032   491 authenticator.cpp:318] Authentication success
I0726 14:53:48.293509   549 authenticatee.cpp:299] Authentication success
I0726 14:53:48.293598   500 master.cpp:5913] Successfully authenticated 
principal 'test-principal' at 
[email protected]:56935
I0726 14:53:48.293686   493 authenticator.cpp:432] Authentication session 
cleanup for crammd5_authenticatee(1)@172.17.0.2:56935
I0726 14:53:48.294248   554 sched.cpp:502] Successfully authenticated with 
master [email protected]:5432
I0726 14:53:48.294280   554 sched.cpp:820] Sending SUBSCRIBE call to 
[email protected]:5432
I0726 14:53:48.294695   554 sched.cpp:853] Will retry registration in 
1.314239818secs if necessary
I0726 14:53:48.295074   490 master.cpp:2424] Received SUBSCRIBE call for 
framework 'Disk Full Framework (C++)' at 
[email protected]:56935
I0726 14:53:48.295230   490 master.cpp:1886] Authorizing framework principal 
'test-principal' to receive offers for role '*'
I0726 14:53:48.296437   502 master.cpp:2500] Subscribing framework Disk Full 
Framework (C++) with checkpointing enabled and capabilities [  ]
I0726 14:53:48.298159   496 hierarchical.cpp:271] Added framework 
c2149f99-5702-44d7-8e17-040e3e228403-0000
I0726 14:53:48.298214   561 sched.cpp:743] Framework registered with 
c2149f99-5702-44d7-8e17-040e3e228403-0000
I0726 14:53:48.298264   561 disk_full_framework.cpp:328] Registered with 
framework ID: c2149f99-5702-44d7-8e17-040e3e228403-0000
I0726 14:53:48.298279   496 hierarchical.cpp:1537] No allocations performed
I0726 14:53:48.298318   561 sched.cpp:757] Scheduler::registered took 50836ns
I0726 14:53:48.298378   496 hierarchical.cpp:1632] No inverse offers to send 
out!
I0726 14:53:48.298486   496 hierarchical.cpp:1172] Performed allocation for 0 
agents in 290704ns
I0726 14:53:49.161883   499 hierarchical.cpp:1537] No allocations performed
I0726 14:53:49.161928   499 hierarchical.cpp:1632] No inverse offers to send 
out!
I0726 14:53:49.162045   499 hierarchical.cpp:1172] Performed allocation for 0 
agents in 203080ns
I0726 14:53:50.163571   495 hierarchical.cpp:1537] No allocations performed
I0726 14:53:50.163611   495 hierarchical.cpp:1632] No inverse offers to send 
out!
...
{code}

> ExamplesTest.DiskFullFramework fails on Arch
> --------------------------------------------
>
>                 Key: MESOS-5907
>                 URL: https://issues.apache.org/jira/browse/MESOS-5907
>             Project: Mesos
>          Issue Type: Bug
>          Components: tests
>         Environment: Linux archlinux.vagrant.vm 4.6.4-1-ARCH #1 SMP PREEMPT 
> Mon Jul 11 19:12:32 CEST 2016 x86_64 GNU/Linux
>            Reporter: Neil Conway
>            Assignee: Joseph Wu
>              Labels: mesosphere
>         Attachments: disk_full_framework.log
>
>
> This test fails consistently on recent Arch linux, running in a VM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to