alamb opened a new issue, #10720: URL: https://github.com/apache/datafusion/issues/10720
### Is your feature request related to a problem or challenge? The DuckDB blog shows off a really cool new feature (access remote datasets from hugging face): https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb I think doing this with DataFusion would be quite cool and quite simple to implement. Being able to add such support quickly would be a good example of how datafusion's extensibility allows rapid feature development as well as being a cool project. ### Describe the solution you'd like I would like to support this type of query from `datafusion-cli`: ```sql SELECT * FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv'; ``` ### Describe alternatives you've considered I think we can just follow the same model as the existing object store integration in datafusion-cli https://github.com/apache/datafusion/blob/088ad010a6ceaa6a2e810d418a2370e45acf3d54/datafusion-cli/src/object_storage.rs#L419-L496 And register the `hf` url with a specially created `Http` object store instance ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
