Hi guys, I am running into the issue that the master crashes due to Broken Pipe.
I have 3 masters in the cluster, two of them crashes in about 2 minutes. Both are because of Broken Pipe. Then, after a while, the 3rd master is down. But I can not find any error log in the 3rd master. Any ideas about this? I am sorry that I can not provide the stacktrace because I use mesos-daemon.sh to start the process, I think the stacktrace is redirected to /dev/null. I will change the start script later. Thanks. The final log of the master is like this I0607 16:26:18.820529 5824 hierarchical_allocator_process.hpp:569] Framework 201306071457-285617930-5050-5812-0000 filtered slave 201306071457-252063498-5050-6406-1 for 1secs I0607 16:26:18.820653 5824 hierarchical_allocator_process.hpp:569] Framework 201306071457-285617930-5050-5812-0000 filtered slave 201306071457-252063498-5050-6406-4 for 1secs I0607 16:26:19.817425 5824 master.cpp:1281] Sending 3 offers to framework 201306071457-285617930-5050-5812-0000 W0607 16:26:19.843829 5828 master.cpp:83] No whitelist given. Advertising offers for all slaves I0607 16:26:20.194185 5824 master.cpp:1281] Sending 3 offers to framework 201306071457-285617930-5050-5812-0001 I0607 16:26:20.223670 5824 master.cpp:1514] Processing reply for offer 201306071457-285617930-5050-5812-19186 on slave 201306071457-252063498-5050-6406-0 (hd5dz.prod.mediav.com) for framework 201306071457-285617930-5050-5812-0000 I0607 16:26:20.223917 5824 master.cpp:1514] Processing reply for offer 201306071457-285617930-5050-5812-19187 on slave 201306071457-252063498-5050-6406-2 (hd4dz.prod.mediav.com) for framework 201306071457-285617930-5050-5812-0000 I0607 16:26:20.224081 5824 master.cpp:1514] Processing reply for offer 201306071457-285617930-5050-5812-19188 on slave 201306071457-252063498-5050-6406-5 (hd6dz.prod.mediav.com) for framework 201306071457-285617930-5050-5812-0000 I0607 16:26:20.224234 5824 master.cpp:1514] Processing reply for offer 201306071457-285617930-5050-5812-19189 on slave 201306071457-252063498-5050-6406-3 (hd3dz.prod.mediav.com) for framework 201306071457-285617930-5050-5812-0001 I0607 16:26:20.224396 5824 master.cpp:1514] Processing reply for offer 201306071457-285617930-5050-5812-19190 on slave 201306071457-252063498-5050-6406-1 (hd7dz.prod.mediav.com) for framework 201306071457-285617930-5050-5812-0001 I0607 16:26:20.224552 5824 master.cpp:1514] Processing reply for offer 201306071457-285617930-5050-5812-19191 on slave 201306071457-252063498-5050-6406-4 (hd2dz.prod.mediav.com) for framework 201306071457-285617930-5050-5812-0001 I0607 16:26:20.224952 5824 hierarchical_allocator_process.hpp:569] Framework 201306071457-285617930-5050-5812-0000 filtered slave 201306071457-252063498-5050-6406-0 for 1secs I0607 16:26:20.225141 5824 hierarchical_allocator_process.hpp:569] Framework 201306071457-285617930-5050-5812-0000 filtered slave 201306071457-252063498-5050-6406-2 for 1secs I0607 16:26:20.225275 5824 hierarchical_allocator_process.hpp:569] Framework 201306071457-285617930-5050-5812-0000 filtered slave 201306071457-252063498-5050-6406-5 for 1secs I0607 16:26:20.225410 5824 hierarchical_allocator_process.hpp:569] Framework 201306071457-285617930-5050-5812-0001 filtered slave 201306071457-252063498-5050-6406-3 for 1secs I0607 16:26:20.225549 5824 hierarchical_allocator_process.hpp:569] Framework 201306071457-285617930-5050-5812-0001 filtered slave 201306071457-252063498-5050-6406-1 for 1secs I0607 16:26:20.225692 5824 hierarchical_allocator_process.hpp:569] Framework 201306071457-285617930-5050-5812-0001 filtered slave 201306071457-252063498-5050-6406-4 for 1secs W0607 16:26:20.424161 5829 logging.cpp:52] Received signal 'Broken pipe', escalating to SIGABRT Guodong
