Leitzler created THRIFT-5828: -------------------------------- Summary: ReadBinary in the binary protocol implementation over-allocates Key: THRIFT-5828 URL: https://issues.apache.org/jira/browse/THRIFT-5828 Project: Thrift Issue Type: Bug Components: Go - Library Affects Versions: 0.21.0, 0.20.0, 0.19.0, 0.18.1, 0.18.0, 0.17.0, 0.16.0, 0.14.2, 0.14.1, 0.15.0, 0.14.0 Environment: Benchmarked using macOS w/ M1 Ultra (arm64). Reporter: Leitzler
*Background* ---- Prior to 0.14.0, the binary protocol implementation of ReadBinary(..) in Go allocated as much memory as the caller asked for causing CVE-2019-11939. This was fixed in [PR #2292|https://github.com/apache/thrift/pull/2292] where we now, instead of allocating everything up front: {code:go} buf := make([]byte, isize) _, err := io.ReadFull(p.trans, buf) {code} .. use io.CopyN: {code:go} buf := new(bytes.Buffer) _, err := io.CopyN(buf, trans, int64(size)) {code} .. which prevents allocating insanely large amount of memory. *The issue* ---- The combination of bytes.Buffer and io.CopyN, however, comes with a subtle caveat: it ensures that we _always_ over-allocate. If the destination writer provided to [io.CopyN|https://pkg.go.dev/io#CopyN] implements io.ReaderFrom, which bytes.Buffer do, copy will use that. Inside [ReadFrom(...) in a bytes.Buffer|https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/bytes/buffer.go;l=200-225], we unconditionally start with growing the buffer so that at least 512 bytes will fit before each Read(..) call. And each time we grow the internal slice inside a bytes.Buffer it doubles in size. This causes additional allocations in the end, as shown by this benchmark test: {code:go} package main import ( "bytes" "crypto/rand" "io" "testing" ) var tt = []struct { dataSize int askedSize int32 }{ {100, 100}, {32 * 1024, 32 * 1024}, // 32k {64 * 1024, 64 * 1024}, // 64k {512 * 1024, 512 * 1024}, // 512k {1024 * 1024, 1024 * 1024}, // 1M {5 * 1024 * 1024, 5 * 1024 * 1024}, // 5M {10 * 1024 * 1024, 10 * 1024 * 1024}, // 10M {40 * 1024 * 1024, 40 * 1024 * 1024}, // 40M } func TestCurrent(t *testing.T) { for _, tc := range tt { bm := testing.Benchmark(func(b *testing.B) { data := make([]byte, tc.dataSize) if _, err := io.ReadFull(rand.Reader, data); err != nil { b.Fatal(err) } b.ResetTimer() for range b.N { safeReadBytes(tc.askedSize, bytes.NewReader(data)) } }) t.Logf("%s - ask: %d, data: %d", bm.MemString(), tc.askedSize, tc.dataSize) } } {code} {code} main_test.go:36: 1656 B/op 5 allocs/op - ask: 100, data: 100 main_test.go:36: 130680 B/op 11 allocs/op - ask: 32768, data: 32768 main_test.go:36: 261753 B/op 12 allocs/op - ask: 65536, data: 65536 main_test.go:36: 2096768 B/op 15 allocs/op - ask: 524288, data: 524288 main_test.go:36: 4193923 B/op 16 allocs/op - ask: 1048576, data: 1048576 main_test.go:36: 16776833 B/op 18 allocs/op - ask: 5242880, data: 5242880 main_test.go:36: 33554060 B/op 19 allocs/op - ask: 10485760, data: 10485760 main_test.go:36: 134217366 B/op 21 allocs/op - ask: 41943040, data: 41943040 {code} As the actual data size grows so does the over-allocation. *Improvement* ---- What we could do, is to use the same approach ReadString() used prior to v0.14.0 (slightly modified to ensure it works as current solution): {code:go} func safeReadBytes(size int32, trans io.Reader) ([]byte, error) { if size < 0 { return nil, nil } const readLimit = 32 * 1024 if size < readLimit { b := make([]byte, size) n, err := io.ReadFull(trans, b) return b[:n], err } b := make([]byte, readLimit) var buf bytes.Buffer var e error var n int for size > 0 { n, e = io.ReadFull(trans, b) buf.Write(b[:n]) if e != nil { break } size -= readLimit if size < readLimit && size > 0 { b = b[:size] } } return buf.Bytes(), e } {code} It will reduce number of allocations, and the amount of memory allocated for small payloads: {code} main_test.go:44: 160 B/op 2 allocs/op - ask: 100, data: 100 main_test.go:44: 65584 B/op 3 allocs/op - ask: 32768, data: 32768 main_test.go:44: 131120 B/op 4 allocs/op - ask: 65536, data: 65536 main_test.go:44: 1048628 B/op 7 allocs/op - ask: 524288, data: 524288 main_test.go:44: 2097203 B/op 8 allocs/op - ask: 1048576, data: 1048576 main_test.go:44: 16777267 B/op 11 allocs/op - ask: 5242880, data: 5242880 main_test.go:44: 33554481 B/op 12 allocs/op - ask: 10485760, data: 10485760 main_test.go:44: 134217777 B/op 14 allocs/op - ask: 41943040, data: 41943040 {code} But as the payload size increase we still over-allocate producing a lot of garbage for the garbage collector. *Proposal* ---- The whole purpose of the original change is to avoid allocating a lot of memory when the message is malformed, an edge case. I propose that we optimise for the ordinary case, allowing the edge case to perform slightly worse by instead using a large read limit. Ideally we would use TConfiguration that was added after PR2292 (in [PR #2296|https://github.com/apache/thrift/pull/2296]), there is already a default max message size at 100MB for example. The max message size is already checked before reading so ReadBinary could be changed back to always allocating the full size. If the max message size is considered too large, and we cannot add a new configuration to TConfiguration (as it must be generic), I propose that we use a large buffer. Using 10MB instead of 32k in the example above would improve the ordinary flow up to 10MB a lot: {code} main_test.go:44: 160 B/op 2 allocs/op - ask: 100, data: 100 main_test.go:44: 32816 B/op 2 allocs/op - ask: 32768, data: 32768 main_test.go:44: 65584 B/op 2 allocs/op - ask: 65536, data: 65536 main_test.go:44: 524338 B/op 2 allocs/op - ask: 524288, data: 524288 main_test.go:44: 1048625 B/op 2 allocs/op - ask: 1048576, data: 1048576 main_test.go:44: 5242930 B/op 2 allocs/op - ask: 5242880, data: 5242880 main_test.go:44: 20971574 B/op 3 allocs/op - ask: 10485760, data: 10485760 main_test.go:44: 83886131 B/op 5 allocs/op - ask: 41943040, data: 41943040 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)