[GitHub] [arrow-datafusion] kmitchener commented on a diff in pull request #3005: various documentation updates

GitBox Wed, 03 Aug 2022 08:42:11 -0700


kmitchener commented on code in PR #3005:
URL: https://github.com/apache/arrow-datafusion/pull/3005#discussion_r936819921



##########
docs/source/user-guide/sql/ddl.md:
##########
@@ -30,9 +90,43 @@ STORED AS PARQUET
 LOCATION '/mnt/nyctaxi/tripdata.parquet';
 ```
 
-CSV data sources can also be registered by executing a `CREATE EXTERNAL TABLE` 
SQL statement. It is necessary to
-provide schema information for CSV files since DataFusion does not 
automatically infer the schema when using SQL
-to query CSV files.
+```sql
+CREATE EXTERNAL TABLE test
+    STORED AS CSV
+    WITH HEADER ROW
+    LOCATION 'c:/tmp/test.csv';
+```
+
+Create an external table with partitioned CSV files
+
+```sql
+CREATE EXTERNAL TABLE p_test
+    STORED AS CSV
+    WITH HEADER ROW
+    PARTITIONED BY (year)
+    LOCATION 'c:/tmp/data';
+```
+
+The above statement looks for CSV files in the `c:/tmp/data` directory and 
creates a table with
+the columns and data types inferred, as well as adding a column for the 
partition:
+
+TODO: describe rules for inference. which files does it look at, how many 
rows? is it configurable?

Review Comment:
   Thanks, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] kmitchener commented on a diff in pull request #3005: various documentation updates

Reply via email to