changhiskhan commented on issue #33986:
URL: https://github.com/apache/arrow/issues/33986#issuecomment-1424733395

   > The toplevel proposal sounds like sidestepping that entirely by 
introducing a separate abstraction layer at the Python level (hence, exposing 
ABCs in Python).
   
   Yup, that's exactly the proposal here.
   
   > I don't think they had a choice, because there's not really a formal API 
for what they really want :)
   
   The main blocker in the current version of DuckDB is using the static 
methods in `Scanner.from_dataset` and similar. I made a PR to change that to 
use the instance method `dataset.to_scanner`. So next release it will be 
*possible* for Rust packages to disguise themselves as pyarrow datasets to 
DuckDB.
   
   The issue is that you have to be really careful to override all of the 
methods in Dataset, or else it'll try to unwrap the non-existent CDataset and 
crash python. This is the main motivation for me proposing a pure python 
abstraction on top.
   
   
   Would y'all be open to accepting a PR for this? or is there a more formal 
process to propose some details?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to