MESOS-2865: executor log showing byte dump of HTTP requests; note that task-2 is never launched even though the whole frame is read by Go's net/http lib Gist: https://gist.github.com/jdef/40a550db373c73298af5
On Fri, Aug 21, 2015 at 5:29 PM, Vetoshkin Nikita < [email protected]> wrote: > Can you show the bytes you captured? Maybe there is an issue with HTTP body > reading and golang HTTP library thinks that there's more to come. > > On Fri, Aug 21, 2015, 13:59 James DeFelice (JIRA) <[email protected]> wrote: > > > > > [ > > > https://issues.apache.org/jira/browse/MESOS-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707454#comment-14707454 > > ] > > > > James DeFelice commented on MESOS-2865: > > --------------------------------------- > > > > Pretty sure the problem is related to Go's internal net/http library > > buffering. Here's what is happening: > > - mesos POST's to a URL, with keep-alive > > - Go's net/http (net.http) reads the first request > > - the mesos-go handler sends a response > > - net.http reads the second request (same pipeline) > > - the mesos-go handler sends a response > > - net.http reads the third request, but doesn't deliver it to the > mesos-go > > handler (even though the full request frame was read) > > - the mesos-go handler waits forever because there's no timeout on the > > connection > > - the undelivered frame is never send to the message handler > > > > I've verified that the bytes are ready from the connection because I > added > > an io.Reader spy that logs all read byte blocks to stdout. It's very > clear > > that the entire message has been received by net.http but for some reason > > it's buffering/hoarding the 3rd request frame. This happens in go-1.4.2, > > and in the just-released go-1.5. > > > > I've tested this further by writing a special net.http Handler that > > bootstraps from Go's net.http server but hijacks the connection of the > > initial request immediately and assumes total control over the message > > framing from thereon. I'm unable to reproduce the lost message effect > with > > the mini http server. > > > > > intermittently the executor is not receiving TASK_KILLED > > > -------------------------------------------------------- > > > > > > Key: MESOS-2865 > > > URL: https://issues.apache.org/jira/browse/MESOS-2865 > > > Project: Mesos > > > Issue Type: Bug > > > Affects Versions: 0.21.1, 0.23.0 > > > Environment: {code} > > > $ dpkg -l |grep -e mesos > > > ii mesos 0.21.1-1.1.ubuntu1404 > > amd64 Cluster resource manager with efficient resource isolation > > > $ uname -a > > > Linux node-1 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC > > 2014 x86_64 x86_64 x86_64 GNU/Linux > > > {code} > > > Reporter: James DeFelice > > > Labels: mesosphere > > > > > > for details, log snippets see > > https://github.com/mesosphere/kubernetes-mesos/issues/328 > > > The slave logs that it's been asked to kill a pod, but the message is > > never logged as received by the executor. > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.3.4#6332) > > > -- James DeFelice 585.241.9488 (voice) 650.649.6071 (fax)
