nlimpid opened a new pull request, #19370: URL: https://github.com/apache/datafusion/pull/19370
## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16240. ## Rationale for this change If the input stream yields no RecordBatch at all, nothing gets sent downstream, and the writer never has a chance to produce a valid file. I added a small fallback: when single_file_output is enabled and no batches were received, we send a single empty RecordBatch with the input schema. ## Are these changes tested? Yes. ## Are there any user-facing changes? 1. I’m not fully convinced this logic belongs in the demuxer. Conceptually, it might be cleaner to handle this one layer downstream on the consumer side. However, that layer doesn’t seem to have access to the schema now, so moving the logic there would require a larger refactor. Currently, I choose the minimal change that fixes the issue while keeping the impact small. 2. Arrow seems like a special case, and there wasn’t much test coverage around. I have written some test cases for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
