Jefffrey commented on code in PR #20813:
URL: https://github.com/apache/datafusion/pull/20813#discussion_r3105120386
##########
datafusion/common/src/config.rs:
##########
@@ -2927,6 +2938,13 @@ config_namespace! {
pub terminator: Option<u8>, default = None
pub escape: Option<u8>, default = None
pub double_quote: Option<bool>, default = None
+ /// Quote style for CSV writing.
+ /// One of: "Always", "Necessary", "NonNumeric", "Never"
+ pub quote_style: CsvQuoteStyle, default = CsvQuoteStyle::Necessary
+ /// Whether to ignore leading whitespace in string values when writing
CSV.
+ pub ignore_leading_whitespace: Option<bool>, default = None
+ /// Whether to ignore trailing whitespace in string values when
writing CSV.
+ pub ignore_trailing_whitespace: Option<bool>, default = None
/// Specifies whether newlines in (quoted) values are supported.
Review Comment:
Perhaps its to do with hierarchy of configs and/or sql parsing 🤔
@alamb do you happen to know a reason for having `Option<bool>` instead of
plain `bool` for cases where it'll end up being true or false anyway? (i.e.
`None` doesn't represent a third state, but eventually maps to either true or
false)
##########
datafusion/sqllogictest/test_files/csv_files.slt:
##########
@@ -380,3 +380,200 @@ SET datafusion.optimizer.repartition_file_min_size =
10485760;
statement ok
drop table stored_table_with_cr_terminator;
+
+# Test quote_style option
+
+statement ok
+CREATE TABLE quote_style_source (
+ int_col INT,
+ string_col TEXT,
+ float_col DOUBLE
+) AS VALUES
+(1, 'hello', 1.1),
+(2, 'world', 2.2),
+(3, 'comma,value', 3.3);
+
+# QuoteStyle::Always - all fields are quoted
+query I
+COPY quote_style_source TO
'test_files/scratch/csv_files/quote_style_always.csv'
+STORED AS csv
+OPTIONS ('format.has_header' 'true', 'format.quote_style' 'Always');
+----
+3
+
+statement ok
+CREATE EXTERNAL TABLE stored_quote_style_always (
+ int_col TEXT,
+ string_col TEXT,
+ float_col TEXT
+) STORED AS CSV
+LOCATION 'test_files/scratch/csv_files/quote_style_always.csv'
+OPTIONS ('format.has_header' 'true', 'format.quote_style' 'Never');
+
+# All values should have been quoted, but reading them back strips the quotes
Review Comment:
Could do something like this:
```sql
statement ok
CREATE EXTERNAL TABLE stored_quote_style_nonnumeric (
whole_file TEXT
) STORED AS CSV
LOCATION 'test_files/scratch/csv_files/quote_style_nonnumeric.csv'
OPTIONS ('format.has_header' 'true', 'format.delimiter' '@');
query T
select * from stored_quote_style_nonnumeric;
----
1,"hello",1.1
2,"world",2.2
3,"comma,value",3.3
```
- Pretty much read entire file as a single column, by choosing a delimiter
that doesn't appear
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]