hknlof opened a new pull request, #17009:
URL: https://github.com/apache/datafusion/pull/17009

   https://github.com/apache/datafusion/issues/13323
   
   ## Which issue does this PR close?
   
   - Closes #13323
   
   ## Rationale for this change
   
   `DF.write_parquet` writes multiple files / one directory even if 
`options.single_file_output` is set.
   
   ## What changes are included in this PR?
   
   Introduce an internal `.single` extension.
   
   ## Are these changes tested?
   
   Yes, tests are part of this PR.
   
   ## Are there any user-facing changes?
   
   Not in this implementation. There might be, if we decide to move to an 
`FileSinkConfig` based solution.
   
   Quoting: 
https://github.com/apache/datafusion/issues/13323#issuecomment-2483134799
   
   > It seems hard to control the behavior of `write_parquet` by 
`single_file_output`(and I've noticed that It's never used), what really 
controls whether to generate a single file output is determining the suffix(in 
`start_demuxer_task()`), there are several methods I can think of to handle 
this issue:
   > 
   > 1. We can add a suffix like `.single` to the paths that require generating 
a single file, and then recognize this suffix in `start_demuxer_task()`.
   > 2. Give up `single_file_output` in `DataFrameWriteOptions`, use 
`FileSinkConfig` instead to control single file behavior.
   > 
   
   
   
   <!--
   If there are any breaking changes to public APIs, please add the `api 
change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to