what version of mesos are you running? we have seen broken pipes before,
but almost always in the slaves. our hypothesis is that it is a bug in
libev. we have recently upgraded libev, so you can try building mesos from
the master branch of
https://git-wip-us.apache.org/repos/asf?p=incubator-mesos.git to get it.
let us know if that fixes your problem. also, stack traces would be nice.


On Fri, Jun 7, 2013 at 1:57 AM, 王国栋 <[email protected]> wrote:

> Hi guys,
>
> I am running into the issue that the master crashes due to Broken Pipe.
>
> I have 3 masters in the cluster, two of them crashes in about 2 minutes.
> Both are because of Broken Pipe. Then, after a while, the 3rd master is
> down. But I can not find any error log in the 3rd master. Any ideas about
> this? I am sorry that I can not provide the stacktrace because I use
> mesos-daemon.sh to start the process, I think the stacktrace is redirected
> to /dev/null. I will change the start script later. Thanks.
>
> The final log of the master is like this
> I0607 16:26:18.820529  5824 hierarchical_allocator_process.hpp:569]
> Framework 201306071457-285617930-5050-5812-0000 filtered slave
> 201306071457-252063498-5050-6406-1 for 1secs
> I0607 16:26:18.820653  5824 hierarchical_allocator_process.hpp:569]
> Framework 201306071457-285617930-5050-5812-0000 filtered slave
> 201306071457-252063498-5050-6406-4 for 1secs
> I0607 16:26:19.817425  5824 master.cpp:1281] Sending 3 offers to framework
> 201306071457-285617930-5050-5812-0000
> W0607 16:26:19.843829  5828 master.cpp:83] No whitelist given. Advertising
> offers for all slaves
> I0607 16:26:20.194185  5824 master.cpp:1281] Sending 3 offers to framework
> 201306071457-285617930-5050-5812-0001
> I0607 16:26:20.223670  5824 master.cpp:1514] Processing reply for offer
> 201306071457-285617930-5050-5812-19186 on slave
> 201306071457-252063498-5050-6406-0 (hd5dz.prod.mediav.com) for framework
> 201306071457-285617930-5050-5812-0000
> I0607 16:26:20.223917  5824 master.cpp:1514] Processing reply for offer
> 201306071457-285617930-5050-5812-19187 on slave
> 201306071457-252063498-5050-6406-2 (hd4dz.prod.mediav.com) for framework
> 201306071457-285617930-5050-5812-0000
> I0607 16:26:20.224081  5824 master.cpp:1514] Processing reply for offer
> 201306071457-285617930-5050-5812-19188 on slave
> 201306071457-252063498-5050-6406-5 (hd6dz.prod.mediav.com) for framework
> 201306071457-285617930-5050-5812-0000
> I0607 16:26:20.224234  5824 master.cpp:1514] Processing reply for offer
> 201306071457-285617930-5050-5812-19189 on slave
> 201306071457-252063498-5050-6406-3 (hd3dz.prod.mediav.com) for framework
> 201306071457-285617930-5050-5812-0001
> I0607 16:26:20.224396  5824 master.cpp:1514] Processing reply for offer
> 201306071457-285617930-5050-5812-19190 on slave
> 201306071457-252063498-5050-6406-1 (hd7dz.prod.mediav.com) for framework
> 201306071457-285617930-5050-5812-0001
> I0607 16:26:20.224552  5824 master.cpp:1514] Processing reply for offer
> 201306071457-285617930-5050-5812-19191 on slave
> 201306071457-252063498-5050-6406-4 (hd2dz.prod.mediav.com) for framework
> 201306071457-285617930-5050-5812-0001
> I0607 16:26:20.224952  5824 hierarchical_allocator_process.hpp:569]
> Framework 201306071457-285617930-5050-5812-0000 filtered slave
> 201306071457-252063498-5050-6406-0 for 1secs
> I0607 16:26:20.225141  5824 hierarchical_allocator_process.hpp:569]
> Framework 201306071457-285617930-5050-5812-0000 filtered slave
> 201306071457-252063498-5050-6406-2 for 1secs
> I0607 16:26:20.225275  5824 hierarchical_allocator_process.hpp:569]
> Framework 201306071457-285617930-5050-5812-0000 filtered slave
> 201306071457-252063498-5050-6406-5 for 1secs
> I0607 16:26:20.225410  5824 hierarchical_allocator_process.hpp:569]
> Framework 201306071457-285617930-5050-5812-0001 filtered slave
> 201306071457-252063498-5050-6406-3 for 1secs
> I0607 16:26:20.225549  5824 hierarchical_allocator_process.hpp:569]
> Framework 201306071457-285617930-5050-5812-0001 filtered slave
> 201306071457-252063498-5050-6406-1 for 1secs
> I0607 16:26:20.225692  5824 hierarchical_allocator_process.hpp:569]
> Framework 201306071457-285617930-5050-5812-0001 filtered slave
> 201306071457-252063498-5050-6406-4 for 1secs
> W0607 16:26:20.424161  5829 logging.cpp:52] Received signal 'Broken pipe',
> escalating to SIGABRT
>
>
>
> Guodong
>

Reply via email to