Thanks Guodong, can you file a bug reporting this? We knew sendfile() was
vulnerable to SIGPIPE as the syscall does not allow a way to mask
(like MSG_NOSIGNAL in send()).


On Fri, Jun 7, 2013 at 7:47 PM, 王国栋 <[email protected]> wrote:

> Hi Vinod,
>
> Today, I see the problem again. I agree with you that it may related to
> libevent. Today I have got the stack trace. The last part of log is like
> this.
> Moreover, is libprocess implemented on libevent ?
>
> I0608 10:25:11.896606 19079 hierarchical_allocator_process.hpp:569]
> Framework 201306071736-252063498-5050-19065-0000 filtered slave
> 201306071736-252063498-5050-19065-2 for 5secs
> I0608 10:25:11.896785 19079 hierarchical_allocator_process.hpp:569]
> Framework 201306071736-252063498-5050-19065-0000 filtered slave
> 201306071736-252063498-5050-19065-3 for 5secs
> W0608 10:25:12.738118 19082 logging.cpp:52] Received signal 'Broken pipe',
> escalating to SIGABRT
> *** Aborted at 1370658312 (unix time) try "date -d @1370658312" if you are
> using GNU date ***
> PC: @       0x30c7e0e38d (unknown)
> *** SIGABRT (@0x4a79) received by PID 19065 (TID 0x4bab6940) from PID
> 19065; stack trace: ***
>     @       0x30c7e0e4c0 (unknown)
>     @       0x30c7e0e38d (unknown)
>     @     0x2b3c35618218 mesos::internal::logging::handler()
>     @       0x30c7e0e4c0 (unknown)
>     @       0x30c72cab4a (unknown)
>     @     0x2b3c357700f1 process::send_file()
>     @     0x2b3c358bd272 ev_invoke_pending
>     @     0x2b3c358c1e0f ev_run
>     @     0x2b3c35766c5b process::serve()
>     @       0x30c7e06367 (unknown)
>     @       0x30c72d30ad (unknown)
>
>
> i am using the code from the git master branch. The latest commit of the
> master snapshot is like this
> commit 85b1f4af1ae54812993863422f8c087448360b79
> Author: Benjamin Hindman <[email protected]>
> Date:   Thu May 30 14:53:40 2013 -0700
>
>     Added a retry option to cgroups::mount in order to deal with a bug
>     with the kernel unmounting the hierarchy asynchronously causing
>     subsequent mounts to fail.
>
>     From: Thomas Marshall <[email protected]>
>     Review: https://reviews.apache.org/r/11547
>
>
>
>
> Guodong
>
>
> On Sat, Jun 8, 2013 at 12:22 AM, Vinod Kone <[email protected]> wrote:
>
> > what version of mesos are you running? we have seen broken pipes before,
> > but almost always in the slaves. our hypothesis is that it is a bug in
> > libev. we have recently upgraded libev, so you can try building mesos
> from
> > the master branch of
> > https://git-wip-us.apache.org/repos/asf?p=incubator-mesos.git to get it.
> > let us know if that fixes your problem. also, stack traces would be nice.
> >
> >
> > On Fri, Jun 7, 2013 at 1:57 AM, 王国栋 <[email protected]> wrote:
> >
> > > Hi guys,
> > >
> > > I am running into the issue that the master crashes due to Broken Pipe.
> > >
> > > I have 3 masters in the cluster, two of them crashes in about 2
> minutes.
> > > Both are because of Broken Pipe. Then, after a while, the 3rd master is
> > > down. But I can not find any error log in the 3rd master. Any ideas
> about
> > > this? I am sorry that I can not provide the stacktrace because I use
> > > mesos-daemon.sh to start the process, I think the stacktrace is
> > redirected
> > > to /dev/null. I will change the start script later. Thanks.
> > >
> > > The final log of the master is like this
> > > I0607 16:26:18.820529  5824 hierarchical_allocator_process.hpp:569]
> > > Framework 201306071457-285617930-5050-5812-0000 filtered slave
> > > 201306071457-252063498-5050-6406-1 for 1secs
> > > I0607 16:26:18.820653  5824 hierarchical_allocator_process.hpp:569]
> > > Framework 201306071457-285617930-5050-5812-0000 filtered slave
> > > 201306071457-252063498-5050-6406-4 for 1secs
> > > I0607 16:26:19.817425  5824 master.cpp:1281] Sending 3 offers to
> > framework
> > > 201306071457-285617930-5050-5812-0000
> > > W0607 16:26:19.843829  5828 master.cpp:83] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > I0607 16:26:20.194185  5824 master.cpp:1281] Sending 3 offers to
> > framework
> > > 201306071457-285617930-5050-5812-0001
> > > I0607 16:26:20.223670  5824 master.cpp:1514] Processing reply for offer
> > > 201306071457-285617930-5050-5812-19186 on slave
> > > 201306071457-252063498-5050-6406-0 (hd5dz.prod.mediav.com) for
> framework
> > > 201306071457-285617930-5050-5812-0000
> > > I0607 16:26:20.223917  5824 master.cpp:1514] Processing reply for offer
> > > 201306071457-285617930-5050-5812-19187 on slave
> > > 201306071457-252063498-5050-6406-2 (hd4dz.prod.mediav.com) for
> framework
> > > 201306071457-285617930-5050-5812-0000
> > > I0607 16:26:20.224081  5824 master.cpp:1514] Processing reply for offer
> > > 201306071457-285617930-5050-5812-19188 on slave
> > > 201306071457-252063498-5050-6406-5 (hd6dz.prod.mediav.com) for
> framework
> > > 201306071457-285617930-5050-5812-0000
> > > I0607 16:26:20.224234  5824 master.cpp:1514] Processing reply for offer
> > > 201306071457-285617930-5050-5812-19189 on slave
> > > 201306071457-252063498-5050-6406-3 (hd3dz.prod.mediav.com) for
> framework
> > > 201306071457-285617930-5050-5812-0001
> > > I0607 16:26:20.224396  5824 master.cpp:1514] Processing reply for offer
> > > 201306071457-285617930-5050-5812-19190 on slave
> > > 201306071457-252063498-5050-6406-1 (hd7dz.prod.mediav.com) for
> framework
> > > 201306071457-285617930-5050-5812-0001
> > > I0607 16:26:20.224552  5824 master.cpp:1514] Processing reply for offer
> > > 201306071457-285617930-5050-5812-19191 on slave
> > > 201306071457-252063498-5050-6406-4 (hd2dz.prod.mediav.com) for
> framework
> > > 201306071457-285617930-5050-5812-0001
> > > I0607 16:26:20.224952  5824 hierarchical_allocator_process.hpp:569]
> > > Framework 201306071457-285617930-5050-5812-0000 filtered slave
> > > 201306071457-252063498-5050-6406-0 for 1secs
> > > I0607 16:26:20.225141  5824 hierarchical_allocator_process.hpp:569]
> > > Framework 201306071457-285617930-5050-5812-0000 filtered slave
> > > 201306071457-252063498-5050-6406-2 for 1secs
> > > I0607 16:26:20.225275  5824 hierarchical_allocator_process.hpp:569]
> > > Framework 201306071457-285617930-5050-5812-0000 filtered slave
> > > 201306071457-252063498-5050-6406-5 for 1secs
> > > I0607 16:26:20.225410  5824 hierarchical_allocator_process.hpp:569]
> > > Framework 201306071457-285617930-5050-5812-0001 filtered slave
> > > 201306071457-252063498-5050-6406-3 for 1secs
> > > I0607 16:26:20.225549  5824 hierarchical_allocator_process.hpp:569]
> > > Framework 201306071457-285617930-5050-5812-0001 filtered slave
> > > 201306071457-252063498-5050-6406-1 for 1secs
> > > I0607 16:26:20.225692  5824 hierarchical_allocator_process.hpp:569]
> > > Framework 201306071457-285617930-5050-5812-0001 filtered slave
> > > 201306071457-252063498-5050-6406-4 for 1secs
> > > W0607 16:26:20.424161  5829 logging.cpp:52] Received signal 'Broken
> > pipe',
> > > escalating to SIGABRT
> > >
> > >
> > >
> > > Guodong
> > >
> >
>

Reply via email to