Blizzara commented on PR #10404:
URL: https://github.com/apache/datafusion/pull/10404#issuecomment-2142169826

   Hey, this changed the behavior of writing and reading CSVs to be slightly 
inconsistent: now writing defaults to **not** writing a header, while reading 
defaults to **having** a header. Previously both defaulted to **having** a 
header, ie. the write side has flipped.
   
   I noticed as our test started failing - a simplified version below, the 
following worked previously, after this commit it fails:
   ```
   #[tokio::test]
   async fn write_then_read_csv() -> Result<()> {
       let test_path = test_file_path("csv/");
   
       let ctx = SessionContext::new();
   
       let df = ctx.sql("SELECT 1 AS col").await?;
       assert_eq!(df.to_owned().count().await?, 1);
       df.write_csv(&test_path, DataFrameWriteOptions::default(), None).await?;
   
       let df = ctx.read_csv(&test_path, CsvReadOptions::default()).await?;
       assert_eq!(df.count().await?, 1);
   
       Ok(())
   }
   ```
   
   but can be fixed by adding 
`Some(CsvOptions::default().with_has_header(true))` to the write_csv (restoring 
old behavior), or alternatively changing read_csv to have 
`CsvReadOptions::default().has_header(false)` to make it match the write.
   
   Doesn't really matter to me how it works, though I find the mismatch 
somewhat surprising - but mainly I just wanted to flag this in case this was 
not expected/intentional change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to