alamb opened a new issue, #10720:
URL: https://github.com/apache/datafusion/issues/10720

   ### Is your feature request related to a problem or challenge?
   
   The DuckDB blog shows off a really cool new feature (access remote datasets 
from hugging face):
   
   
https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb
   
   I think doing this with DataFusion would be quite cool and quite simple to 
implement. Being able to add such support quickly would be a good example of 
how datafusion's extensibility allows rapid feature development as well as 
being a cool project. 
   
   ### Describe the solution you'd like
   
   I would like to support this type of query from `datafusion-cli`:
   
   ```sql
   SELECT *
   FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';
   ```
   
   ### Describe alternatives you've considered
   
   I think we can just follow the same model as the existing object store 
integration in datafusion-cli
   
   
https://github.com/apache/datafusion/blob/088ad010a6ceaa6a2e810d418a2370e45acf3d54/datafusion-cli/src/object_storage.rs#L419-L496
   
   And register the `hf` url with a specially created `Http` object store 
instance
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to