daniel-adam-tfs commented on issue #540: URL: https://github.com/apache/arrow-go/issues/540#issuecomment-3437951288
So I switched to main and redid the memory profile. If I understand https://github.com/apache/arrow-go/commit/f0b6fd9eacfd244cdef200a6115873e8279f4297 correct, the point was for serializedPageReader.decompress to allocate memory using the memory package. However, the call to `io.CopyN` might trigger a rellocation, in my memory profile it did so 2/3 of the time, because `io.CopyN` wraps the source buffer in LimitReader struct https://github.com/golang/go/blob/39ed968832ad8923a4bd1fb6bc3d9090ddd98401/src/io/io.go#L364C27-L364C38 and continues to https://github.com/golang/go/blob/39ed968832ad8923a4bd1fb6bc3d9090ddd98401/src/io/io.go#L415 which in our case goes through https://github.com/golang/go/blob/39ed968832ad8923a4bd1fb6bc3d9090ddd98401/src/bytes/buffer.go#L215 And the loop here is causing the issue, because the loop exists only when you hit EOF, however LimitReader never returns EOF on the first read, so even when you read everything in one go. So the iteration in ReadFrom continues for one more run, but that means that you must have at least MinRead (=512) bytes still available in your buffer, which we don't have. TL;DR you need to resize the decompress buffer to desired size + MinRead to avoid the reallocation. 🏆 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
