fishy commented on pull request #2293: URL: https://github.com/apache/thrift/pull/2293#issuecomment-745709232
OK I cannot reproduce this issue with go1.15.6 locally. Looking through travis log we were using go 1.10.8 to run that test, and I do can reproduce this issue with go1.10.8 almost 100% locally. My current understanding is that in the test we are relying on canceled context to make the http request fail: https://github.com/apache/thrift/blob/e872b350ab0cfd2402340b8ca82b02dc299db69e/lib/go/thrift/http_client_test.go#L129 This works fine in go1.15.6. In go1.10.8 although it did make the `Flush` call next line fail, it doesn't really cause the write loop of the http client to stop correctly. So after you reset the `requestBuffer` inside `Flush`, the write loop of the http client still tries to read from `requestBuffer` in order to write the request, and that caused both data race and the test failure. I think the big, underlying problem is that when making the http request inside `Flush`, we passed the `requestBuffer` to the http request (https://github.com/apache/thrift/blob/e872b350ab0cfd2402340b8ca82b02dc299db69e/lib/go/thrift/http_client.go#L208). This doesn't make a copy of the buffer anywhere, so the ownership of the buffer becomes muddy: `THttpClient` thinks it owns the data and can call its `Reset`, while the http request also thinks it owns the data and it's the only one reading/writing from it. I think the proper fix would be that we need to make a copy of the buffer before sending it to http request. It will hurt performance, though. @dcelasun what do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
