galachad commented on issue #44524:
URL: https://github.com/apache/arrow/issues/44524#issuecomment-2441150812
We have some legacy code that isn't compatible with `tibble`. When the
`data.frame` attributes are removed, parquet files are saved without a class
(starting from arrow version 13+). In the absence of the `class` attribute,
Arrow reads the file as a `tibble` by default.
While I understand the intent to remove unnecessary attributes, the
assumption that `data.frame` is the default format is incorrect. The default
type for reading parquet files is actually `tibble`.
If the goal is to remove attributes for the default table type in R, we
should remove the class only when it is `c("tbl_df", "tbl", "data.frame")`, not
just `'data.frame'`.
As I mention in the description, to avoid class modification, extra
workaround is required now. Unfortunately, the `tibble` and `data.frame` are
not 100% compatible.
The example that the assumption was made is probably incorrect:
```r
> class(arrow::arrow_table(name = "1", mtcars)$to_data_frame())
[1] "tbl_df" "tbl" "data.frame"
> class(arrow::arrow_table(mtcars, name = "1")$to_data_frame())
[1] "tbl_df" "tbl" "data.frame"
```
In this case, `mtcars` is a `data.frame` and should be represented as
`data.frame`.
In my opinion, this bug is tiny and can be resolved just by remove `remove
the class if it's just data.frame` section.
https://github.com/apache/arrow/blob/7ef5437e23bd7d7571a0c7a7fc0c5d3634816802/r/R/metadata.R#L25-L31
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]