zeroshade commented on code in PR #13068:
URL: https://github.com/apache/arrow/pull/13068#discussion_r865950160


##########
go/parquet/file/page_reader.go:
##########
@@ -611,8 +610,6 @@ func (p *serializedPageReader) Next() bool {
                        // we don't know this page type, we're allowed to skip 
non-data pages
                        continue
                }
-
-               p.buf = memory.NewResizableBuffer(p.mem)

Review Comment:
   Ah I see what was happening. I passed the raw bytes as a *new* buffer to the 
page itself. So technically even though the accounting in the memory allocator 
is now correct, if the allocator isn't a Go allocator this will actually end up 
using the buffer after it's been freed. if you look at line 596, it calls 
`memory.NewBufferBytes(data)` to pass the buffer to the datapage itself. 
`NewBufferBytes` constructs a buffer which wraps around a byte slice, but it 
doesn't attempt to *free* that slice when it is garbage collected and assumes 
that the Go garbage collector will handle it. 
   
   This is all fine for a byteslice that is allocated by Go itself, but since 
this is a buffer coming from the resizable buffer allocated using the 
`memory.Allocator` it's possible that this memory wasn't allocated by Go at all.
   
   Because data pages were always created using `NewBufferBytes` I don't 
believe I added a `Release` function to them. At this point I think the safest 
solution is to add a `Release` to the `DataPage` interface and corresponding 
structs, ensure it gets called by the page reader when it sets the current page 
to nil or otherwise, and just pass the buffer as is to the page rather than 
calling `NewBufferBytes`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to