westonpace commented on issue #10339:
URL: https://github.com/apache/datafusion/issues/10339#issuecomment-2504235618

   I've made a stab at this.
   
   > However, my personal opinion is that such encouragement can be done via 
documentation and if people want to implement RPC network calls during planning 
then the APIs shouldn't stop them
   
   The easiest way we've found to do this kind of thing is with a metadata 
cache.  However, this cache gets invalidated and has cold start, etc.  The 
problem with "warming the cache prior to the query" is that it is very 
difficult to determine which entries will be required by an SQL string.  
Loading the entire catalog into memory for a single query is prohibitively 
expensive for us.
   
   > I think the biggest challenge is, as @metesynnada hints at above, the 
viral nature of async -- if we make such APIs async then everywhere they are 
called must also be be async -- I haven't looked at how far down the stack that 
is but it could be substantail.
   
   Yes :cold_sweat:
   
   > An alternate approach might be to implement, via some hackery and tokio 
channels, an struct that implements the SchemaProvider without changes (sync) 
but can call async methods (though that would block the runtime thread 🤔 )
   
   I could find no reasonable way to implement such hackery.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to