zeroshade commented on code in PR #13068:
URL: https://github.com/apache/arrow/pull/13068#discussion_r865950160
##########
go/parquet/file/page_reader.go:
##########
@@ -611,8 +610,6 @@ func (p *serializedPageReader) Next() bool {
// we don't know this page type, we're allowed to skip
non-data pages
continue
}
-
- p.buf = memory.NewResizableBuffer(p.mem)
Review Comment:
Ah I see what was happening. I passed the raw bytes as a *new* buffer to the
page itself. So technically even though the accounting in the memory allocator
is now correct, if the allocator isn't a Go allocator this will actually end up
using the buffer after it's been freed. if you look at line 596, it calls
`memory.NewBufferBytes(data)` to pass the buffer to the datapage itself.
`NewBufferBytes` constructs a buffer which wraps around a byte slice, but it
doesn't attempt to *free* that slice when it is garbage collected and assumes
that the Go garbage collector will handle it.
This is all fine for a byteslice that is allocated by Go itself, but since
this is a buffer coming from the resizable buffer allocated using the
`memory.Allocator` it's possible that this memory wasn't allocated by Go at all.
Because data pages were always created using `NewBufferBytes` I don't
believe I added a `Release` function to them. At this point I think the safest
solution is to add a `Release` to the `DataPage` interface and corresponding
structs, ensure it gets called by the page reader when it sets the current page
to nil or otherwise, and just pass the buffer as is to the page rather than
calling `NewBufferBytes`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]