tustvold opened a new issue, #2206: URL: https://github.com/apache/arrow-datafusion/issues/2206
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This has been discussed in various places, https://github.com/apache/arrow-datafusion/issues/907 and https://github.com/datafusion-contrib/datafusion-objectstore-s3/pull/53 to name a few, so creating an issue for visibility. **Describe the solution you'd like** I would propose creating a new datafusion-contrib crate, perhaps `datafusion-catalog-glue`, which communicates with an [AWS Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api.html). I'll leave the exact design for whoever picks this up, but I might expect something along the following lines. * Create a `GlueCatalog` with an optional catalog ID * Provide a `async fn GlueCatalog::list_databases(&self) -> Vec<String>` to list the databases * Provide a `async fn GlueCatalog::get_database(&self, name: &str) -> Result<GlueDatabase>` to get a database * Implement `SchemaProvider` for `GlueDatabase` I think it should be possible to reuse the `FileScanConfig` structure used by `ListingTable` to simplify implementation of the `TableProvider`. **Describe alternatives you've considered** We could not support AWS Glue **Additional context** This will help with https://github.com/datafusion-contrib/datafusion-objectstore-s3/pull/53 by alleviating the need to infer the schema from the files on every query, and only listing files in non-pruned partitions. This may need to depend on https://github.com/datafusion-contrib/datafusion-objectstore-s3 as I think it will still need to list S3 in order to get the files within a given partition. The Glue API is not the snappiest of things, so a future extension might be to cache the metadata returned, as is done by the [Java client](https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore#enabling-client-side-caching-for-catalog). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
