Le 29/06/2023 à 09:50, Wenbo Hu a écrit :
Hi,
I'm using Jpype to pass streams between java and python back and forth.
For follow code works fine with its release callback
```python
with child_allocator("test-allocator") as allocator:
r = some_package.InMemoryArrowReader.create(allocator)
c_stream = arrow_c.new("struct ArrowArrayStream*")
c_stream_ptr = int(arrow_c.cast("uintptr_t", c_stream))
s = org.apache.arrow.c.ArrowArrayStream.wrap(c_stream_ptr)
org.apache.arrow.c.Data.exportArrayStream(allocator, r, s)
with pa.RecordBatchReader._import_from_c(c_stream_ptr) as stream:
for rb in stream: # type: pa.RecordBatch
logger.info(rb.num_rows) # yield weakref.proxy(rb)
del rb # release callback directly called in current
thread?
```
But if del statment not called, the allocator from java side would
raise exception that memory leaks.
That's not surprising. The `rb` variable keeps the record batch and its
backing memory alive. This is the expected semantics. Otherwise,
accessing `rb`'s contents would crash.
Also, an warning message output to err
```
WARNING: Failed to release Java C Data resource: Failed to attach the
current thread to a Java VM
```
That's probably because the release callback is called at process exit,
after the JVM is shutdown?
> Is yielding a weakref-ed `rb` a good idea? Will the weakref-ed
> RecordBatchReader works with other pyarrow api (dataset)?
That would probably not solve the problem. Users can trivially get a
strong reference from the weakref, and keep it alive too long.
Regards
Antoine.