tinfoil-knight commented on code in PR #9744:
URL: https://github.com/apache/arrow-datafusion/pull/9744#discussion_r1536222673


##########
datafusion/sqllogictest/test_files/copy.slt:
##########
@@ -514,6 +514,44 @@ OPTIONS (
 );
 
 
+# Format Options Support with format in OPTIONS i.e. COPY { table_name | query 
} TO 'file_name' OPTIONS (format <format-name>, ...)
+
+query I
+COPY (select * from (values (1))) to 'test_files/scratch/copy/'
+OPTIONS (format parquet);
+----
+1
+
+query I
+COPY (select * from (values (1))) to 'test_files/scratch/copy/'
+OPTIONS (format parquet, compression 'zstd(10)');
+----
+1
+
+query I
+COPY (select * from (values (1))) to 'test_files/scratch/copy/'
+OPTIONS (format json, compression gzip);
+----
+1
+
+query I
+COPY (select * from (values (1))) to 'test_files/scratch/copy/'
+OPTIONS (
+    format csv,
+    has_header false,
+    compression xz,
+    datetime_format '%FT%H:%M:%S.%9f',
+    delimiter ';',
+    null_value 'NULLVAL'
+);
+----
+1
+
+query error DataFusion error: Invalid or Unsupported Configuration: This 
feature is not implemented: Unknown FileType: NOTVALIDFORMAT

Review Comment:
   The error feels a bit long to me compared to others. We can reduce it to:
   `DataFusion error: This feature is not implemented: Unknown FileType: 
NOTVALIDFORMAT`
   
   I didn't do this because all the other branches were wrapping downstream 
errors with `DataFusionError::Configuration` & then returning them.
   
   WDYT?



##########
datafusion/sql/src/statement.rs:
##########
@@ -867,26 +867,32 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 DataFusionError::Configuration(format!("Unknown FileType {}", 
file_type))
             })?
         } else {
-            let e = || {
-                DataFusionError::Configuration(
-                "Format not explicitly set and unable to get file extension! 
Use STORED AS to define file format."
-                    .to_string(),
-            )
-            };
-            // try to infer file format from file extension
-            let extension: &str = &Path::new(&statement.target)
-                .extension()
-                .ok_or_else(e)?
-                .to_str()
-                .ok_or_else(e)?
-                .to_lowercase();
-
-            FileType::from_str(extension).map_err(|e| {
-                DataFusionError::Configuration(format!(
-                    "{}. Use STORED AS to define file format.",
-                    e
-                ))
-            })?
+            if let Some(format) = options.remove("format") {
+                // try to infer file format from the "format" key in options
+                FileType::from_str(&format)
+                    .map_err(|e| DataFusionError::Configuration(format!("{}", 
e)))?
+            } else {
+                let e = || {
+                    DataFusionError::Configuration(
+                    "Format not explicitly set and unable to get file 
extension! Use STORED AS to define file format."
+                        .to_string(),
+                )
+                };
+                // try to infer file format from file extension
+                let extension: &str = &Path::new(&statement.target)
+                    .extension()
+                    .ok_or_else(e)?
+                    .to_str()
+                    .ok_or_else(e)?
+                    .to_lowercase();
+
+                FileType::from_str(extension).map_err(|e| {
+                    DataFusionError::Configuration(format!(
+                        "{}. Use STORED AS to define file format.",
+                        e
+                    ))
+                })?
+            }

Review Comment:
   One thing to note is that, currently, order of priority for resolving the 
`file_type` is:
   1. `STORED AS`
   2. Legacy: `COPY {} TO {} OPTIONS (format <>)`
   3. File extension
   
   Let me know if we care about the order of resolution & if this requires any 
change.



##########
datafusion/sql/src/statement.rs:
##########
@@ -867,26 +867,32 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 DataFusionError::Configuration(format!("Unknown FileType {}", 
file_type))
             })?
         } else {
-            let e = || {
-                DataFusionError::Configuration(
-                "Format not explicitly set and unable to get file extension! 
Use STORED AS to define file format."
-                    .to_string(),
-            )
-            };
-            // try to infer file format from file extension
-            let extension: &str = &Path::new(&statement.target)
-                .extension()
-                .ok_or_else(e)?
-                .to_str()
-                .ok_or_else(e)?
-                .to_lowercase();
-
-            FileType::from_str(extension).map_err(|e| {
-                DataFusionError::Configuration(format!(
-                    "{}. Use STORED AS to define file format.",
-                    e
-                ))
-            })?
+            if let Some(format) = options.remove("format") {
+                // try to infer file format from the "format" key in options
+                FileType::from_str(&format)
+                    .map_err(|e| DataFusionError::Configuration(format!("{}", 
e)))?
+            } else {

Review Comment:
   Note for Reviewer:
   I couldn't figure out a way to do an early return (while handling errors 
properly) from the if-block so had to move all this code. 
   
   Please let me know if you've any better suggestions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to