a-agmon commented on issue #16303: URL: https://github.com/apache/datafusion/issues/16303#issuecomment-2953899593
I gave it a shot but it ended up being somewhat messy. Thats mostly due to the fact that on the one hand `TableFunctionImpl::call()` is synchronous, yet, on the other hand, it also has to get a hold of the schema of the data, which in the case of remote blobs (like s3), requires IO and async to be done right. I was trying to work around this by using the `call()` method to create a `TableProvider` that initially reports an empty schema. This satisfies the planner's synchronous API. The actual schema discovery is deferred until the scan() method is called during the asynchronous execution phase. But this creates an issue with projections that require to validate schema, i.e, `select X from read_csv(some-glob-pattern)` though `select * from read_csv(some-glob-pattern)` will work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org