Blizzara commented on PR #10404:
URL: https://github.com/apache/datafusion/pull/10404#issuecomment-2142169826
Hey, this changed the behavior of writing and reading CSVs to be slightly
inconsistent: now writing defaults to **not** writing a header, while reading
defaults to **having** a header. Previously both defaulted to **having** a
header, ie. the write side has flipped.
I noticed as our test started failing - a simplified version below, the
following worked previously, after this commit it fails:
```
#[tokio::test]
async fn write_then_read_csv() -> Result<()> {
let test_path = test_file_path("csv/");
let ctx = SessionContext::new();
let df = ctx.sql("SELECT 1 AS col").await?;
assert_eq!(df.to_owned().count().await?, 1);
df.write_csv(&test_path, DataFrameWriteOptions::default(), None).await?;
let df = ctx.read_csv(&test_path, CsvReadOptions::default()).await?;
assert_eq!(df.count().await?, 1);
Ok(())
}
```
but can be fixed by adding
`Some(CsvOptions::default().with_has_header(true))` to the write_csv (restoring
old behavior), or alternatively changing read_csv to have
`CsvReadOptions::default().has_header(false)` to make it match the write.
Doesn't really matter to me how it works, though I find the mismatch
somewhat surprising - but mainly I just wanted to flag this in case this was
not expected/intentional change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]