devinjdangelo commented on issue #9684: URL: https://github.com/apache/arrow-datafusion/issues/9684#issuecomment-2007077634
> Given I can see that the default behavior may not be ideal, perhaps we can add a configuration setting that controls how non-existent paths that don't end with / are handled We previously had `single_file_output` which was a statement level config that determined if the path should be treated as a file or directory. We intentionally removed this in https://github.com/apache/arrow-datafusion/pull/9041 in favor of inference based on the path ending in '/'. We could add a session level config to control how a path is interpreted as @alamb suggests. Perhaps we could also improve the inference logic by additionally checking for the presence of a valid file extension before concluding a path is a file. E.g.: 1. `tmp/dataset/` -> is a folder since it ends in `/` 2. `tmp/dataset` -> is still a folder since it does not end in `/` but has no valid file extension 3. `tmp/file.parquet` -> is a file since it does not end in `/` and has a valid file extension `.parquet` 4. `tmp/file.parquet/` -> is a folder since it ends in `/` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
