westonpace opened a new issue, #941:
URL: https://github.com/apache/datafusion-python/issues/941

   This is similar to #920 but maybe more specific.  Lance 
(https://github.com/lancedb/lance) has a custom table provider and I was 
interested in using datafusion-python with this table provider.  However, I'm 
not sure there is an easy solution.
   
   I was hoping, in Lance's python bindings, I could just do something like...
   
   ```
   use datafusion_python::context::PySessionContext;
   
   #[pymethod]
   pub fn register_datafusion(ctx: &PySessionContext, tbl_name: String, ds_uri: 
String) -> PyResult<()> {
       // ...
   }
   ```
   
   Then use this in python as:
   
   ```
   from datafusion import SessionContext
   from .lance import register_datafusion
   
   ctx = SessionContext()
   register_datafusion(ctx.ctx, "my_tbl", "some_uri")
   ```
   
   Unfortunately, this leads to:
   
   ```
   TypeError: argument 'ctx': 'SessionContext' object cannot be converted to 
'SessionContext'
   ```
   
   I suspect the problem is that the `SessionContext` linked into lance's 
python module is different from the `SessionContext` linked into 
datafusion_python's python module.
   
   Here's a few thoughts off the top of my head.  Maybe there is something 
easier I am missing however.
   
   1. Add Lance to datafusion-python
   
   A simple, but not ideal, solution is to just add lance as a dependency to 
datafusion-python.  I'm assuming that the datafusion-python project doesn't 
want 3rd party dependencies however.
   
   2. Use pyarrow dataset as a "dataset protocol"
   
   The "dataset protocol" never got quite finished but we can kind of use 
pyarrow datasets as the dataset protocol.  This is actually what I've ended up 
using for the time being.  I use register_dataset and `LanceDataset` already 
duck types as a pyarrow dataset so this works but it's not as flexible.
   
   3. Add support via datafusion-federation
   
   I'm not entirely sure this is possible but it seems the 
[datafusion-federation](https://github.com/datafusion-contrib/datafusion-federation)
 project may have a way of handling abstract table providers over Substrait.  
datafusion-python could add datafusion-federation as a dependency to allow a 
`register_federated` method.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to