hknlof opened a new pull request, #17009: URL: https://github.com/apache/datafusion/pull/17009
https://github.com/apache/datafusion/issues/13323 ## Which issue does this PR close? - Closes #13323 ## Rationale for this change `DF.write_parquet` writes multiple files / one directory even if `options.single_file_output` is set. ## What changes are included in this PR? Introduce an internal `.single` extension. ## Are these changes tested? Yes, tests are part of this PR. ## Are there any user-facing changes? Not in this implementation. There might be, if we decide to move to an `FileSinkConfig` based solution. Quoting: https://github.com/apache/datafusion/issues/13323#issuecomment-2483134799 > It seems hard to control the behavior of `write_parquet` by `single_file_output`(and I've noticed that It's never used), what really controls whether to generate a single file output is determining the suffix(in `start_demuxer_task()`), there are several methods I can think of to handle this issue: > > 1. We can add a suffix like `.single` to the paths that require generating a single file, and then recognize this suffix in `start_demuxer_task()`. > 2. Give up `single_file_output` in `DataFrameWriteOptions`, use `FileSinkConfig` instead to control single file behavior. > <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org