brunal commented on code in PR #16342:
URL: https://github.com/apache/datafusion/pull/16342#discussion_r2185262228
##########
datafusion/datasource/src/file_sink_config.rs:
##########
@@ -77,13 +79,34 @@ pub trait FileSink: DataSink {
.runtime_env()
.object_store(&config.object_store_url)?;
let (demux_task, file_stream_rx) = start_demuxer_task(config, data,
context);
- self.spawn_writer_tasks_and_join(
- context,
- demux_task,
- file_stream_rx,
- object_store,
- )
- .await
+ let mut num_rows = self
+ .spawn_writer_tasks_and_join(
+ context,
+ demux_task,
+ file_stream_rx,
+ Arc::clone(&object_store),
+ )
+ .await?;
+ if num_rows == 0 {
+ // If no rows were written, then no files are output either.
Review Comment:
You say now row => no file was created.
But then you say write an empty recordbatch => ensure a file gets created.
Except an empty recordbatch has no rows (at least when written to a parquet
file).
Your 2 sentences don't make sense together.
In practice, this PR caused a regression: we cannot write empty recordbatch
to parquet anymore, as the code here tries to write it a second time, and we
get an error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]