Thanks for your explanation, Antoine. I figured out why I'm facing the memory leak and need to call delete explicit. my example code may mislead the situation. The key problem is when I wrap the code of convert java stream to RecordBatchReader, I generate a child allocator from current context (lives as long as the RecordBatchReader) to call exportArrayStream in a generator, so the consumer/callback always outlives the RecordBatchReader and its underlying allocator (not the allocator of java stream, but that of exportArrayStream).
When I specify the allocator of the convert with a longer lives allocator (as long as the consumer/callback), code works as expected. Antoine Pitrou <anto...@python.org> 于2023年6月29日周四 17:55写道: > > > Le 29/06/2023 à 09:50, Wenbo Hu a écrit : > > Hi, > > > > I'm using Jpype to pass streams between java and python back and forth. > > > > For follow code works fine with its release callback > > ```python > > > > with child_allocator("test-allocator") as allocator: > > r = some_package.InMemoryArrowReader.create(allocator) > > c_stream = arrow_c.new("struct ArrowArrayStream*") > > c_stream_ptr = int(arrow_c.cast("uintptr_t", c_stream)) > > > > s = org.apache.arrow.c.ArrowArrayStream.wrap(c_stream_ptr) > > org.apache.arrow.c.Data.exportArrayStream(allocator, r, s) > > > > with pa.RecordBatchReader._import_from_c(c_stream_ptr) as stream: > > for rb in stream: # type: pa.RecordBatch > > logger.info(rb.num_rows) # yield weakref.proxy(rb) > > del rb # release callback directly called in current > > thread? > > ``` > > > > But if del statment not called, the allocator from java side would > > raise exception that memory leaks. > > That's not surprising. The `rb` variable keeps the record batch and its > backing memory alive. This is the expected semantics. Otherwise, > accessing `rb`'s contents would crash. > > > Also, an warning message output to err > > ``` > > WARNING: Failed to release Java C Data resource: Failed to attach the > > current thread to a Java VM > > ``` > > That's probably because the release callback is called at process exit, > after the JVM is shutdown? > > > Is yielding a weakref-ed `rb` a good idea? Will the weakref-ed > > RecordBatchReader works with other pyarrow api (dataset)? > > That would probably not solve the problem. Users can trivially get a > strong reference from the weakref, and keep it alive too long. > > Regards > > Antoine. -- --------------------- Best Regards, Wenbo Hu,