jonkeane commented on pull request #11730:
URL: https://github.com/apache/arrow/pull/11730#issuecomment-1016801995


   Thank you so much for all this info. I chatted briefly with @pdet about 
this. Here are some notes from that conversation (though please correct me if 
I'm misrepresenting any of this!)
   
   > In reality there were five exec plans created. Four of these came from 
calls to RArrowTabularStreamFactory::Produce (duckdb code). All four calls to 
Produce had the same factory_p. All four of these yield 4 batches (as expected) 
although the first three scans don't appear to be consumed. The first three 
calls had an empty project_columns and a null filters. The fourth call had a 
valid project_columns and a valid filters.
   > 
   > Across the run 12 instances of ExportedArrayStream are created. Only 4 of 
these are released. I don't think this is a problem but figured I'd record it. 
I believe the private data here is a shared_ptr so these could just be copies 
but for something that has a Release method I expected parity between creation 
and Release. Each of the first three calls to 
RArrowTabularStreamFactory::Produce generates 2 ExportedArrayStream instances 
and releases 1. None of the ExportedArrayStream's created by these calls ever 
yields any actual arrays.
   
   @pdet these extra calls are for getting the schema information, yeah?
   
   The other source in DuckDB that might be relevant (and would be good for us 
to check form a using-Arrow's-C-interface-correctly perspective) are in: 
https://github.com/duckdb/duckdb/blob/master/src/main/query_result.cpp
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to