jonkeane commented on pull request #11730: URL: https://github.com/apache/arrow/pull/11730#issuecomment-1016801995
Thank you so much for all this info. I chatted briefly with @pdet about this. Here are some notes from that conversation (though please correct me if I'm misrepresenting any of this!) > In reality there were five exec plans created. Four of these came from calls to RArrowTabularStreamFactory::Produce (duckdb code). All four calls to Produce had the same factory_p. All four of these yield 4 batches (as expected) although the first three scans don't appear to be consumed. The first three calls had an empty project_columns and a null filters. The fourth call had a valid project_columns and a valid filters. > > Across the run 12 instances of ExportedArrayStream are created. Only 4 of these are released. I don't think this is a problem but figured I'd record it. I believe the private data here is a shared_ptr so these could just be copies but for something that has a Release method I expected parity between creation and Release. Each of the first three calls to RArrowTabularStreamFactory::Produce generates 2 ExportedArrayStream instances and releases 1. None of the ExportedArrayStream's created by these calls ever yields any actual arrays. @pdet these extra calls are for getting the schema information, yeah? The other source in DuckDB that might be relevant (and would be good for us to check form a using-Arrow's-C-interface-correctly perspective) are in: https://github.com/duckdb/duckdb/blob/master/src/main/query_result.cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
