Thanks for your explanation, Antoine.

I figured out why I'm facing the memory leak and need to call delete explicit.
my example code may mislead the situation. The key problem is when I
wrap the code of convert java stream to RecordBatchReader, I generate
a child allocator from current context (lives as long as the
RecordBatchReader) to call exportArrayStream in a generator, so the
consumer/callback always outlives the RecordBatchReader and its
underlying allocator (not the allocator of java stream, but that of
exportArrayStream).

When I specify the allocator of the convert with a longer lives
allocator (as long as the consumer/callback), code works as expected.

Antoine Pitrou <anto...@python.org> 于2023年6月29日周四 17:55写道:
>
>
> Le 29/06/2023 à 09:50, Wenbo Hu a écrit :
> > Hi,
> >
> > I'm using Jpype to pass streams between java and python back and forth.
> >
> > For follow code works fine with its release callback
> > ```python
> >
> >      with child_allocator("test-allocator") as allocator:
> >          r = some_package.InMemoryArrowReader.create(allocator)
> >          c_stream = arrow_c.new("struct ArrowArrayStream*")
> >          c_stream_ptr = int(arrow_c.cast("uintptr_t", c_stream))
> >
> >          s = org.apache.arrow.c.ArrowArrayStream.wrap(c_stream_ptr)
> >          org.apache.arrow.c.Data.exportArrayStream(allocator, r, s)
> >
> >          with pa.RecordBatchReader._import_from_c(c_stream_ptr) as stream:
> >              for rb in stream:  # type: pa.RecordBatch
> >                  logger.info(rb.num_rows)     # yield weakref.proxy(rb)
> >                  del rb     # release callback directly called in current 
> > thread?
> > ```
> >
> > But if del statment not called, the allocator from java side would
> > raise exception that memory leaks.
>
> That's not surprising. The `rb` variable keeps the record batch and its
> backing memory alive. This is the expected semantics. Otherwise,
> accessing `rb`'s contents would crash.
>
> > Also, an warning message output to err
> > ```
> > WARNING: Failed to release Java C Data resource: Failed to attach the
> > current thread to a Java VM
> > ```
>
> That's probably because the release callback is called at process exit,
> after the JVM is shutdown?
>
>  > Is yielding a weakref-ed `rb` a good idea? Will the weakref-ed
>  > RecordBatchReader works with other pyarrow api (dataset)?
>
> That would probably not solve the problem. Users can trivially get a
> strong reference from the weakref, and keep it alive too long.
>
> Regards
>
> Antoine.



-- 
---------------------
Best Regards,
Wenbo Hu,

Reply via email to