westonpace commented on PR #13800:
URL: https://github.com/apache/datafusion/pull/13800#issuecomment-2568592701

   @alamb I've rebased and updated the example.  I think the only remaining 
issue is your comment here:
   
   > When trying this API out I didn't fully understand this API (or what i 
should return) -- maybe if it is optional / has an advanced usecase we could 
provide a default implementation
   
   The basic problem I think is that, since we are evaluating the table 
references directly ourselves we have to figure out which table references make 
sense for the schema provider.  In the sync case this is not neccesary because 
you do this:
   
   ```
   ctx.catalog("my_catalog").unwrap().register_schema("my_schema", 
Arc::clone(&remote_schema))?;
   ```
   
   As a result, by the time the query planner is even using your table 
provider, it already has done the lookup into `my_catalog` and `my_schema`.  
We, however, can't rely on that, because we're working with the table providers 
themselves.  Or, to put it another way, I am replacing this part of your 
example:
   
   ```
       let table_names = references.iter().filter_map(|r| {
           if refers_to_schema("datafusion", "remote_schema", r) {
               Some(r.table())
           } else {
               None
           }
       });
   ```
   
   I need to filter down the references to figure out which ones apply to the 
schema provider before I send the request out to the remote catalog (to allow 
the possibility that other requests are handled elsewhere).
   
   The reason I don't exactly love my current solution is that I think these 
methods can eventually go away.  If we add `register_async_catalog` / 
`register_async_schema` methods to the `SessionContext` and move the caching 
inside there then we can rely on the same lookup mechanism that exists there.
   
   Still, I don't think these methods are onerous for the implementer and would 
rather just make progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to