Blizzara commented on PR #10404: URL: https://github.com/apache/datafusion/pull/10404#issuecomment-2142169826
Hey, this changed the behavior of writing and reading CSVs to be slightly inconsistent: now writing defaults to **not** writing a header, while reading defaults to **having** a header. Previously both defaulted to **having** a header, ie. the write side has flipped. I noticed as our test started failing - a simplified version below, the following worked previously, after this commit it fails: ``` #[tokio::test] async fn write_then_read_csv() -> Result<()> { let test_path = test_file_path("csv/"); let ctx = SessionContext::new(); let df = ctx.sql("SELECT 1 AS col").await?; assert_eq!(df.to_owned().count().await?, 1); df.write_csv(&test_path, DataFrameWriteOptions::default(), None).await?; let df = ctx.read_csv(&test_path, CsvReadOptions::default()).await?; assert_eq!(df.count().await?, 1); Ok(()) } ``` but can be fixed by adding `Some(CsvOptions::default().with_has_header(true))` to the write_csv (restoring old behavior), or alternatively changing read_csv to have `CsvReadOptions::default().has_header(false)` to make it match the write. Doesn't really matter to me how it works, though I find the mismatch somewhat surprising - but mainly I just wanted to flag this in case this was not expected/intentional change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org