Jefffrey commented on issue #5383:
URL: 
https://github.com/apache/arrow-datafusion/issues/5383#issuecomment-1442887119

   MRE:
   
   ```rust
   #[tokio::main]
   async fn main() -> Result<()> {
       let ctx = SessionContext::new();
   
       ctx.sql("select 1")
           .await?
           .repartition(Partitioning::Hash(vec![lit(0)], 5))?
           .write_csv("csv")
           .await?;
   
       Ok(())
   }
   ```
   
   Output:
   
   ```
   jeffrey:~/Code/arrow-datafusion$ tree -h csv
   [4.0K]  csv
   ├── [   0]  part-0.csv
   ├── [   0]  part-1.csv
   ├── [   0]  part-2.csv
   ├── [   0]  part-3.csv
   └── [  11]  part-4.csv
   
   0 directories, 5 files
   ```
   
   Can see its due to empty partitions still being written out to disk.
   
   Not sure if in writing logic its possible to check if a partition is empty 
before attempting to write to disk?
   
   
https://github.com/apache/arrow-datafusion/blob/1309267e713523bc5d1c23e34dcc934d6d30c22b/datafusion/core/src/physical_plan/file_format/csv.rs#L297-L314
   
   - Without requiring execution twice
   - Not to mention could lead to case where no files are written (if all 
partitions are empty), unsure if desirable
   
   Probably easiest to update the documentation to reflect behaviour?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to