eitsupi commented on PR #34825:
URL: https://github.com/apache/arrow/pull/34825#issuecomment-1496137992
If I remember correctly, `read_feather` and `read_parquet` restore the
attributes of R as they were before they were written, so if we didn't write
tibble to arrow file or parquet, it didn't become tibble, I think.
On the other hand, I think Feather V1 was always tibble.
```r
> data.table::data.table(a = 1) |> arrow::write_parquet("test.parquet")
> arrow::read_parquet("test.parquet") |> class()
[1] "data.table" "data.frame"
> data.table::data.table(a = 1) |> arrow::write_feather("test.arrow")
> arrow::read_feather("test.arrow") |> class()
[1] "data.table" "data.frame"
> data.table::data.table(a = 1) |> arrow::write_feather("test.feather",
version = "1")
> arrow::read_feather("test.feather") |> class()
[1] "tbl_df" "tbl" "data.frame"
```
And since dplyr often returns tibble, it makes sense that executing
`collect` would result in tibble.
> If data.table wants fread on IPC streams or feather, nanoarrow will
probably be a better long-term solution (IPC support is in the C library
although I haven't had time to do and R wrapper yet).
This is definitely great!
I just wanted to point out that there may be users of data.table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]