devinjdangelo commented on code in PR #9382:
URL: https://github.com/apache/arrow-datafusion/pull/9382#discussion_r1507575479
##########
datafusion/sql/src/statement.rs:
##########
@@ -709,25 +711,42 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
}
};
- // TODO, parse options as Vec<(String, String)> to avoid this
conversion
- let options = statement
- .options
- .iter()
- .map(|(s, v)| (s.to_owned(), v.to_string()))
- .collect::<Vec<(String, String)>>();
+ let mut options = HashMap::new();
Review Comment:
I think it is a good idea to move parsing of the options here, so we no
longer need an enum for CopyOptions :+1:
##########
datafusion/sqllogictest/test_files/copy.slt:
##########
@@ -251,30 +261,30 @@ query IT
COPY source_table
TO 'test_files/scratch/copy/table_with_options/'
(format parquet,
-compression snappy,
-'compression::col1' 'zstd(5)',
-'compression::col2' snappy,
-max_row_group_size 12345,
-data_pagesize_limit 1234,
-write_batch_size 1234,
-writer_version 2.0,
-dictionary_page_size_limit 123,
-created_by 'DF copy.slt',
-column_index_truncate_length 123,
-data_page_row_count_limit 1234,
-bloom_filter_enabled true,
-'bloom_filter_enabled::col1' false,
-'bloom_filter_fpp::col2' 0.456,
-'bloom_filter_ndv::col2' 456,
-encoding plain,
-'encoding::col1' DELTA_BINARY_PACKED,
-'dictionary_enabled::col2' true,
-dictionary_enabled false,
-statistics_enabled page,
-'statistics_enabled::col2' none,
-max_statistics_size 123,
-bloom_filter_fpp 0.001,
-bloom_filter_ndv 100
+'format.parquet.compression' snappy,
Review Comment:
It would be nice if the config options didn't need to be literal quoted e.g.
'format.parquet.compression'. I think it would be ideal if DFParser were aware
of the custom syntax we are creating for options including the dots and double
colons, so a user could simply write:
```sql
COPY table
TO 'file.parquet'
(
format.parquet.compression snappy,
format.parquet.compression::col1 zstd(5)
)
```
Related is #9274
##########
datafusion/expr/src/logical_plan/dml.rs:
##########
@@ -39,46 +39,29 @@ pub struct CopyTo {
/// Determines which, if any, columns should be used for hive-style
partitioned writes
pub partition_by: Vec<String>,
/// Arbitrary options as tuples
- pub copy_options: CopyOptions,
-}
-
-/// When the logical plan is constructed from SQL, CopyOptions
-/// will contain arbitrary string tuples which must be parsed into
-/// FileTypeWriterOptions. When the logical plan is constructed directly
-/// from rust code (such as via the DataFrame API), FileTypeWriterOptions
-/// can be provided directly, avoiding the run time cost and fallibility of
-/// parsing string based options.
-#[derive(Clone)]
-pub enum CopyOptions {
- /// Holds StatementOptions parsed from a SQL statement
- SQLOptions(StatementOptions),
- /// Holds FileTypeWriterOptions directly provided
- WriterOptions(Box<FileTypeWriterOptions>),
+ pub format_options: FormatOptions,
+ ///
+ pub source_option_tuples: HashMap<String, String>,
Review Comment:
Does source_option_tuples hold any option which does not fall into a format
specific namespace?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]