This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new bb6c57f81f Minor: add more examples for `CREATE EXTERNAL TABLE` doc
(#7594)
bb6c57f81f is described below
commit bb6c57f81fc3648530ec81ac2a636e55b91238ae
Author: comphead <[email protected]>
AuthorDate: Tue Sep 19 03:42:10 2023 -0700
Minor: add more examples for `CREATE EXTERNAL TABLE` doc (#7594)
* Minor: add more example for Create Table doc
* More desc
* fmt
* reorg
---
docs/source/user-guide/cli.md | 112 +++++++++++++++++++++++++-----------------
1 file changed, 66 insertions(+), 46 deletions(-)
diff --git a/docs/source/user-guide/cli.md b/docs/source/user-guide/cli.md
index e3a8cd74c3..e1f332baf3 100644
--- a/docs/source/user-guide/cli.md
+++ b/docs/source/user-guide/cli.md
@@ -23,49 +23,6 @@ The DataFusion CLI is a command-line interactive SQL utility
for executing
queries against any supported data files. It is a convenient way to
try DataFusion's SQL support with your own data.
-## Example
-
-Create a CSV file to query.
-
-```shell
-$ echo "a,b" > data.csv
-$ echo "1,2" >> data.csv
-```
-
-Query that single file (the CLI also supports parquet, compressed csv, avro,
json and more)
-
-```shell
-$ datafusion-cli
-DataFusion CLI v17.0.0
-❯ select * from 'data.csv';
-+---+---+
-| a | b |
-+---+---+
-| 1 | 2 |
-+---+---+
-1 row in set. Query took 0.007 seconds.
-```
-
-You can also query directories of files with compatible schemas:
-
-```shell
-$ ls data_dir/
-data.csv data2.csv
-```
-
-```shell
-$ datafusion-cli
-DataFusion CLI v16.0.0
-❯ select * from 'data_dir';
-+---+---+
-| a | b |
-+---+---+
-| 3 | 4 |
-| 1 | 2 |
-+---+---+
-2 rows in set. Query took 0.007 seconds.
-```
-
## Installation
### Install and run using Cargo
@@ -131,17 +88,64 @@ OPTIONS:
-V, --version Print version information
```
-## Selecting files directly
+## Querying data from the files directly
Files can be queried directly by enclosing the file or
directory name in single `'` quotes as shown in the example.
+## Example
+
+Create a CSV file to query.
+
+```shell
+$ echo "a,b" > data.csv
+$ echo "1,2" >> data.csv
+```
+
+Query that single file (the CLI also supports parquet, compressed csv, avro,
json and more)
+
+```shell
+$ datafusion-cli
+DataFusion CLI v17.0.0
+❯ select * from 'data.csv';
++---+---+
+| a | b |
++---+---+
+| 1 | 2 |
++---+---+
+1 row in set. Query took 0.007 seconds.
+```
+
+You can also query directories of files with compatible schemas:
+
+```shell
+$ ls data_dir/
+data.csv data2.csv
+```
+
+```shell
+$ datafusion-cli
+DataFusion CLI v16.0.0
+❯ select * from 'data_dir';
++---+---+
+| a | b |
++---+---+
+| 3 | 4 |
+| 1 | 2 |
++---+---+
+2 rows in set. Query took 0.007 seconds.
+```
+
+## Creating external tables
+
It is also possible to create a table backed by files by explicitly
-via `CREATE EXTERNAL TABLE` as shown below.
+via `CREATE EXTERNAL TABLE` as shown below. Filemask wildcards supported
## Registering Parquet Data Sources
-Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE`
SQL statement. It is not necessary to provide schema information for Parquet
files.
+Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE`
SQL statement. The schema information will be derived automatically.
+
+Register a single file parquet datasource
```sql
CREATE EXTERNAL TABLE taxi
@@ -149,6 +153,22 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
+Register a single folder parquet datasource. All files inside must be valid
parquet files!
+
+```sql
+CREATE EXTERNAL TABLE taxi
+STORED AS PARQUET
+LOCATION '/mnt/nyctaxi/';
+```
+
+Register a single folder parquet datasource by specifying a wildcard for files
to read
+
+```sql
+CREATE EXTERNAL TABLE taxi
+STORED AS PARQUET
+LOCATION '/mnt/nyctaxi/*.parquet';
+```
+
## Registering CSV Data Sources
CSV data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL
statement.