MESOS-2865: executor log showing byte dump of HTTP requests; note that
task-2 is never launched even though the whole frame is read by Go's
net/http lib
Gist: https://gist.github.com/jdef/40a550db373c73298af5

On Fri, Aug 21, 2015 at 5:29 PM, Vetoshkin Nikita <
[email protected]> wrote:

> Can you show the bytes you captured? Maybe there is an issue with HTTP body
> reading and golang HTTP library thinks that there's more to come.
>
> On Fri, Aug 21, 2015, 13:59 James DeFelice (JIRA) <[email protected]> wrote:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/MESOS-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707454#comment-14707454
> > ]
> >
> > James DeFelice commented on MESOS-2865:
> > ---------------------------------------
> >
> > Pretty sure the problem is related to Go's internal net/http library
> > buffering. Here's what is happening:
> > - mesos POST's to a URL, with keep-alive
> > - Go's net/http (net.http) reads the first request
> > - the mesos-go handler sends a response
> > - net.http reads the second request (same pipeline)
> > - the mesos-go handler sends a response
> > - net.http reads the third request, but doesn't deliver it to the
> mesos-go
> > handler (even though the full request frame was read)
> > - the mesos-go handler waits forever because there's no timeout on the
> > connection
> > - the undelivered frame is never send to the message handler
> >
> > I've verified that the bytes are ready from the connection because I
> added
> > an io.Reader spy that logs all read byte blocks to stdout. It's very
> clear
> > that the entire message has been received by net.http but for some reason
> > it's buffering/hoarding the 3rd request frame. This happens in go-1.4.2,
> > and in the just-released go-1.5.
> >
> > I've tested this further by writing a special net.http Handler that
> > bootstraps from Go's net.http server but hijacks the connection of the
> > initial request immediately and assumes total control over the message
> > framing from thereon. I'm unable to reproduce the lost message effect
> with
> > the mini http server.
> >
> > > intermittently the executor is not receiving TASK_KILLED
> > > --------------------------------------------------------
> > >
> > >                 Key: MESOS-2865
> > >                 URL: https://issues.apache.org/jira/browse/MESOS-2865
> > >             Project: Mesos
> > >          Issue Type: Bug
> > >    Affects Versions: 0.21.1, 0.23.0
> > >         Environment: {code}
> > > $ dpkg -l |grep -e mesos
> > > ii  mesos                               0.21.1-1.1.ubuntu1404
> > amd64        Cluster resource manager with efficient resource isolation
> > > $ uname -a
> > > Linux node-1 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC
> > 2014 x86_64 x86_64 x86_64 GNU/Linux
> > > {code}
> > >            Reporter: James DeFelice
> > >              Labels: mesosphere
> > >
> > > for details, log snippets see
> > https://github.com/mesosphere/kubernetes-mesos/issues/328
> > > The slave logs that it's been asked to kill a pod, but the message is
> > never logged as received by the executor.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>



-- 
James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

Reply via email to