Le 29/06/2023 à 09:50, Wenbo Hu a écrit :
Hi,

I'm using Jpype to pass streams between java and python back and forth.

For follow code works fine with its release callback
```python

     with child_allocator("test-allocator") as allocator:
         r = some_package.InMemoryArrowReader.create(allocator)
         c_stream = arrow_c.new("struct ArrowArrayStream*")
         c_stream_ptr = int(arrow_c.cast("uintptr_t", c_stream))

         s = org.apache.arrow.c.ArrowArrayStream.wrap(c_stream_ptr)
         org.apache.arrow.c.Data.exportArrayStream(allocator, r, s)

         with pa.RecordBatchReader._import_from_c(c_stream_ptr) as stream:
             for rb in stream:  # type: pa.RecordBatch
                 logger.info(rb.num_rows)     # yield weakref.proxy(rb)
                 del rb     # release callback directly called in current 
thread?
```

But if del statment not called, the allocator from java side would
raise exception that memory leaks.

That's not surprising. The `rb` variable keeps the record batch and its backing memory alive. This is the expected semantics. Otherwise, accessing `rb`'s contents would crash.

Also, an warning message output to err
```
WARNING: Failed to release Java C Data resource: Failed to attach the
current thread to a Java VM
```

That's probably because the release callback is called at process exit, after the JVM is shutdown?

> Is yielding a weakref-ed `rb` a good idea? Will the weakref-ed
> RecordBatchReader works with other pyarrow api (dataset)?

That would probably not solve the problem. Users can trivially get a strong reference from the weakref, and keep it alive too long.

Regards

Antoine.

Reply via email to