alamb commented on code in PR #16342:
URL: https://github.com/apache/datafusion/pull/16342#discussion_r2138987979
##########
datafusion/core/src/datasource/file_format/csv.rs:
##########
@@ -795,6 +796,25 @@ mod tests {
Ok(())
}
+ #[tokio::test]
+ async fn test_csv_write_empty_file() -> Result<()> {
Review Comment:
Can you please also add a test of what happens if we are writing into a
directory (aka a partitioned file)?
Like a test that targets a path of `/foo`
In this case, I think I would expect that no files are created when the data
stream is empty
##########
datafusion/datasource/src/file_sink_config.rs:
##########
@@ -77,13 +79,32 @@ pub trait FileSink: DataSink {
.runtime_env()
.object_store(&config.object_store_url)?;
let (demux_task, file_stream_rx) = start_demuxer_task(config, data,
context);
- self.spawn_writer_tasks_and_join(
- context,
- demux_task,
- file_stream_rx,
- object_store,
- )
- .await
+ let mut num_rows = self
+ .spawn_writer_tasks_and_join(
+ context,
+ demux_task,
+ file_stream_rx,
+ Arc::clone(&object_store),
+ )
+ .await?;
+ if num_rows == 0 {
Review Comment:
I think it would help here to add some comments with context about what it
is doing. For example, something like
```suggestion
// If no rows were written, then no files are output either. In this
case
// send an empty record batch through to ensure the output file is
generated
if num_rows == 0 {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]