[ https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619417#comment-16619417 ]
Alexander Rukletsov edited comment on MESOS-8545 at 9/21/18 12:56 PM: ---------------------------------------------------------------------- *{{master}} aka {{1.8-dev}}*: {noformat} commit 5b95bb0f21852058d22703385f2c8e139881bf1a Author: Andrei Budnik <abud...@mesosphere.com> AuthorDate: Tue Sep 18 19:10:14 2018 +0200 Commit: Alexander Rukletsov <al...@apache.org> CommitDate: Tue Sep 18 19:10:14 2018 +0200 Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard. Previously, IOSwitchboard process could terminate before all HTTP responses had been sent to the agent. In the case of `ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK` response, so the agent got broken HTTP connection for the call. This patch introduces an acknowledgment for the received response for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new type of control messages for the `ATTACH_CONTAINER_INPUT` call. When IOSwitchboard receives an acknowledgment, and io redirects are finished, it terminates itself. That guarantees that the agent always receives a response for the `ATTACH_CONTAINER_INPUT` call. Review: https://reviews.apache.org/r/65168/ {noformat} {noformat} commit 5b95bb0f21852058d22703385f2c8e139881bf1a Author: Andrei Budnik <abud...@mesosphere.com> AuthorDate: Tue Sep 18 19:10:14 2018 +0200 Commit: Alexander Rukletsov <al...@apache.org> CommitDate: Tue Sep 18 19:10:14 2018 +0200 Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard. Previously, IOSwitchboard process could terminate before all HTTP responses had been sent to the agent. In the case of `ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK` response, so the agent got broken HTTP connection for the call. This patch introduces an acknowledgment for the received response for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new type of control messages for the `ATTACH_CONTAINER_INPUT` call. When IOSwitchboard receives an acknowledgment, and io redirects are finished, it terminates itself. That guarantees that the agent always receives a response for the `ATTACH_CONTAINER_INPUT` call. Review: https://reviews.apache.org/r/65168/ {noformat} {noformat} commit bfa2bd24780b5c49467b3c23260855e3d8b4c948 Author: Andrei Budnik <abud...@mesosphere.com> AuthorDate: Fri Sep 21 14:51:24 2018 +0200 Commit: Alexander Rukletsov <al...@apache.org> CommitDate: Fri Sep 21 14:51:24 2018 +0200 Fixed disconnection while sending acknowledgment to IOSwitchboard. Previously, an HTTP connection to the IOSwitchboard could be garbage collected before the agent sent an acknowledgment to the IOSwitchboard via this connection. This patch fixes the issue by keeping a reference count to the connection in a lambda callback until disconnection occurs. Review: https://reviews.apache.org/r/68768/ {noformat} {noformat} commit c3c77cbef818d497d8bd5e67fa72e55a7190e27a Author: Andrei Budnik <abud...@mesosphere.com> AuthorDate: Fri Sep 21 14:51:59 2018 +0200 Commit: Alexander Rukletsov <al...@apache.org> CommitDate: Fri Sep 21 14:51:59 2018 +0200 Fixed broken pipe error in IOSwitchboard. Previous attempt to fix `HTTP 500` "broken pipe" in review /r/62187/ was not correct: after IOSwitchboard sends a response to the agent for the `ATTACH_CONTAINER_INPUT` call, the socket is closed immediately, thus causing the error on the agent. This patch adds a delay after IO redirects are finished and before IOSwitchboard forcibly send a response. Review: https://reviews.apache.org/r/68784/ {noformat} *{{1.7.1}}*: {noformat} commit 1672941630960cccf66ed81b11811d84e8a4e3f0 commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679 {noformat} *{{1.6.2}}*: {noformat} commit 2ddd6f07bebbe91e1e0d5165c4a5ae552b836303 commit c1448f36d4c2c2c8345e7e8d1bf1f206dba18dac {noformat} *{{1.5.2}}*: {noformat} commit 3bf4fe22e0ed828a36d5b2ea652d07c6eef4b578 commit 33a6bec95b44592d626874ae8deaa3e2a3bbc120 {noformat} was (Author: alexr): *{{master}} aka {{1.8-dev}}*: {noformat} commit 5b95bb0f21852058d22703385f2c8e139881bf1a Author: Andrei Budnik <abud...@mesosphere.com> AuthorDate: Tue Sep 18 19:10:14 2018 +0200 Commit: Alexander Rukletsov <al...@apache.org> CommitDate: Tue Sep 18 19:10:14 2018 +0200 Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard. Previously, IOSwitchboard process could terminate before all HTTP responses had been sent to the agent. In the case of `ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK` response, so the agent got broken HTTP connection for the call. This patch introduces an acknowledgment for the received response for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new type of control messages for the `ATTACH_CONTAINER_INPUT` call. When IOSwitchboard receives an acknowledgment, and io redirects are finished, it terminates itself. That guarantees that the agent always receives a response for the `ATTACH_CONTAINER_INPUT` call. Review: https://reviews.apache.org/r/65168/ {noformat} {noformat} commit 5b95bb0f21852058d22703385f2c8e139881bf1a Author: Andrei Budnik <abud...@mesosphere.com> AuthorDate: Tue Sep 18 19:10:14 2018 +0200 Commit: Alexander Rukletsov <al...@apache.org> CommitDate: Tue Sep 18 19:10:14 2018 +0200 Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard. Previously, IOSwitchboard process could terminate before all HTTP responses had been sent to the agent. In the case of `ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK` response, so the agent got broken HTTP connection for the call. This patch introduces an acknowledgment for the received response for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new type of control messages for the `ATTACH_CONTAINER_INPUT` call. When IOSwitchboard receives an acknowledgment, and io redirects are finished, it terminates itself. That guarantees that the agent always receives a response for the `ATTACH_CONTAINER_INPUT` call. Review: https://reviews.apache.org/r/65168/ {noformat} *{{1.7.1}}*: {noformat} commit 1672941630960cccf66ed81b11811d84e8a4e3f0 commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679 {noformat} *{{1.6.2}}*: {noformat} commit 2ddd6f07bebbe91e1e0d5165c4a5ae552b836303 commit c1448f36d4c2c2c8345e7e8d1bf1f206dba18dac {noformat} *{{1.5.2}}*: {noformat} commit 3bf4fe22e0ed828a36d5b2ea652d07c6eef4b578 commit 33a6bec95b44592d626874ae8deaa3e2a3bbc120 {noformat} > AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky. > ------------------------------------------------------------------- > > Key: MESOS-8545 > URL: https://issues.apache.org/jira/browse/MESOS-8545 > Project: Mesos > Issue Type: Bug > Components: agent > Affects Versions: 1.5.0, 1.6.1, 1.7.0 > Reporter: Andrei Budnik > Assignee: Andrei Budnik > Priority: Major > Labels: Mesosphere, flaky-test > Fix For: 1.5.2, 1.6.2, 1.7.1, 1.8.0 > > Attachments: > AgentAPIStreamingTest.AttachInputToNestedContainerSession-badrun.txt, > AgentAPIStreamingTest.AttachInputToNestedContainerSession-badrun2.txt > > > {code:java} > I0205 17:11:01.091872 4898 http_proxy.cpp:132] Returning '500 Internal Server > Error' for '/slave(974)/api/v1' (Disconnected) > /home/centos/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-centos-7/mesos/src/tests/api_tests.cpp:6596: > Failure > Value of: (response).get().status > Actual: "500 Internal Server Error" > Expected: http::OK().status > Which is: "200 OK" > Body: "Disconnected" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)