xinlifoobar commented on code in PR #10792:
URL: https://github.com/apache/datafusion/pull/10792#discussion_r1632877093


##########
docs/source/user-guide/cli/datasources.md:
##########
@@ -347,3 +347,60 @@ Supported configuration options are:
 | `GOOGLE_APPLICATION_CREDENTIALS` | `gcp.application_credentials_path` | 
location of application credentials file |
 | `GOOGLE_BUCKET`                  |                                    | 
bucket name                              |
 | `GOOGLE_BUCKET_NAME`             |                                    | 
(alias) bucket name                      |
+
+## Hugging Face
+
+The `datafusion-cli` supports querying datasets from the [Hugging Face 
Hub](https://huggingface.co/datasets) for both public and private datasets.
+
+For example, to query directly a public dataset from the Hugging Face Hub:
+
+```sql
+SELECT question, answer
+FROM "hf://datasets/cais/mmlu/astronomy/dev-00000-of-00001.parquet";
+```
+
+It is also possible to query a list of files from a dataset:
+
+```sql
+CREATE EXTERNAL TABLE astronomy
+STORED AS parquet
+LOCATION "hf://datasets/cais/mmlu/astronomy/";
+```
+
+and then
+
+```sql
+SELECT question, answer
+FROM astronomy;
+```
+
+To query a private dataset, you need to set the either the 
hf.user_access_token or the HF_USER_ACCESS_TOKEN environment variable:
+
+```sql
+CREATE EXTERNAL TABLE astronomy
+OPTIONS (
+    'hf.user_access_token' '******'
+)
+STORED AS parquet
+LOCATION "hf://datasets/cais/mmlu/astronomy/";
+```

Review Comment:
   This is currently not working due to:
   1. The hugging face user access token is case-sensitive.
   2. a previous change enforces every option value in lower case. 
https://github.com/apache/datafusion/pull/9723/files.
   
   I will figure out the history to see whether this will be feasible.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to