Re: [PR] Utilize memory allocator in ReadProperties.GetStream [arrow-go]

via GitHub Mon, 27 Oct 2025 09:15:54 -0700


zeroshade commented on code in PR #547:
URL: https://github.com/apache/arrow-go/pull/547#discussion_r2466277983



##########
parquet/file/page_reader.go:
##########
@@ -501,7 +504,16 @@ func (p *serializedPageReader) Page() Page {
 }
 
 func (p *serializedPageReader) decompress(rd io.Reader, lenCompressed int, buf 
[]byte) ([]byte, error) {
-       p.decompressBuffer.ResizeNoShrink(lenCompressed)
+       // As of go1.25.3: There is an issue when bytes.Buffer and io.CopyN are 
used together. io.CopyN
+       // uses io.LimitReader, which does an additional read on the underlying 
reader to determine EOF.
+       // However, bytes.Buffer always attempts to read at least bytes.MinRead 
(which is 512 bytes) from the
+       // underlying reader, even if there is less data available than that. 
So even if there are no more bytes,
+       // the buffer must have at least bytes.MinRead capacity remaining to 
avoid a relocation.
+       allocSize := lenCompressed
+       if p.decompressBuffer.Cap() < lenCompressed+bytes.MinRead {

Review Comment:
   I agree that this seems really fragile. Maybe `io.ReadFull` directly into 
`p.decompressBuffer.Bytes()[:lenCompressed]` instead of using the intermediate 
`bytes.Buffer`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Utilize memory allocator in ReadProperties.GetStream [arrow-go]

Reply via email to