fishy commented on pull request #2295: URL: https://github.com/apache/thrift/pull/2295#issuecomment-746667382
@dcelasun so if you look at the data race output (note that for stdlib the line numbers are for [go1.10.8](https://github.com/golang/go/tree/go1.10.8)): ``` ================== WARNING: DATA RACE Read at 0x00c42030e1d8 by goroutine 14: bytes.(*Buffer).Read() /usr/local/go/src/bytes/buffer.go:70 +0x80 io/ioutil.(*nopCloser).Read() <autogenerated>:1 +0x7d net/http.transferBodyReader.Read() /usr/local/go/src/net/http/transfer.go:60 +0x75 io.(*LimitedReader).Read() /usr/local/go/src/io/io.go:446 +0xca bufio.(*Writer).ReadFrom() /usr/local/go/src/bufio/bufio.go:703 +0x13b io.copyBuffer() /usr/local/go/src/io/io.go:386 +0x417 io.Copy() /usr/local/go/src/io/io.go:362 +0x74 net/http.(*transferWriter).WriteBody() /usr/local/go/src/net/http/transfer.go:337 +0x98a net/http.(*Request).write() /usr/local/go/src/net/http/request.go:622 +0x797 net/http.(*persistConn).writeLoop() /usr/local/go/src/net/http/transport.go:1825 +0x2e0 Previous write at 0x00c42030e1d8 by goroutine 31: _/thrift/src/lib/go/thrift.(*THttpClient).Flush.func1() /usr/local/go/src/bytes/buffer.go:100 +0xaf _/thrift/src/lib/go/thrift.(*THttpClient).Flush() /thrift/src/lib/go/thrift/http_client.go:218 +0x4cd _/thrift/src/lib/go/thrift.TestHTTPClientFlushesRequestBufferOnErrors() /thrift/src/lib/go/thrift/http_client_test.go:130 +0x303 testing.tRunner() /usr/local/go/src/testing/testing.go:777 +0x16d Goroutine 14 (running) created at: net/http.(*Transport).dialConn() /usr/local/go/src/net/http/transport.go:1238 +0xb4a net/http.(*Transport).getConn.func4() /usr/local/go/src/net/http/transport.go:957 +0x9f Goroutine 31 (running) created at: testing.(*T).Run() /usr/local/go/src/testing/testing.go:824 +0x564 testing.runTests.func1() /usr/local/go/src/testing/testing.go:1063 +0xa4 testing.tRunner() /usr/local/go/src/testing/testing.go:777 +0x16d testing.runTests() /usr/local/go/src/testing/testing.go:1061 +0x4e1 testing.(*M).Run() /usr/local/go/src/testing/testing.go:978 +0x2cd main.main() _testmain.go:332 +0x22a ================== ``` It's that goroutine 31 (the one running the test) called the deferred `Reset` on the buffer inside `Flush` function, while goroutine 14 is trying to read from the buffer. The thing is that unlike most other `TTransport` implementations that all the I/O happened on the same goroutine, it's not the case for the http clients. If you follow the line numbers from the call stack, you'll notice that the http client actually spawned 2 goroutines for read and write (goroutine 14 is the one [for write](https://github.com/golang/go/blob/b0cb374daf646454998bac7b393f3236a2ab6aca/src/net/http/transport.go#L1238), as in it reads from the input (buffer) and writes them to the wire). So what happened is: 1. The "main" goroutine (the one from the test) inside http client found out that the context is already canceled, *after* it already spawned the goroutines for the read and write loops. It returns the error after the context cancellation check (and triggered the deferred `Reset`) 2. In the mean time, the write loop goroutine haven't done the cancellation check yet, so it still tries to read from the buffer. It's flaky on travis likely due to the resource limitation on the environment, that the main goroutine runs a lot slower, so when it detects context cancellation the write loop already did the check. I can almost reproduce it 100% on my local machine with go1.10.8 because I have more cores and the race condition is much more likely to happen. After go1.10.8 there're likely some change in stdlib http client to do the cancellation check earlier (I haven't dig into the details yet), but it doesn't mean it's safe to call `Reset` after we handed over the buffer to http client. Inside http client they have locks and stuff to guarantee that two goroutines don't try to access the buffer at the same time, but outside of that (in thrift's `THttpClient.Flush`) we are not part of that coordination. We should just give up the ownership of the buffer instead. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
