[GitHub] [arrow] eitsupi commented on pull request #34825: GH-34775: [R] arrow_table: as.data.frame() sometimes returns a tbl and sometimes a data.frame

via GitHub Tue, 04 Apr 2023 08:06:29 -0700


eitsupi commented on PR #34825:
URL: https://github.com/apache/arrow/pull/34825#issuecomment-1496137992


   If I remember correctly, `read_feather` and `read_parquet` restore the 
attributes of R as they were before they were written, so if we didn't write 
tibble to arrow file or parquet, it didn't become tibble, I think.
   On the other hand, I think Feather V1 was always tibble.
   
   ```r
   > data.table::data.table(a = 1) |> arrow::write_parquet("test.parquet")
   
   > arrow::read_parquet("test.parquet") |> class()
   [1] "data.table" "data.frame"
   
   > data.table::data.table(a = 1) |> arrow::write_feather("test.arrow")
   
   > arrow::read_feather("test.arrow") |> class()
   [1] "data.table" "data.frame"
   
   > data.table::data.table(a = 1) |> arrow::write_feather("test.feather", 
version = "1")
   
   > arrow::read_feather("test.feather") |> class()
   [1] "tbl_df"     "tbl"        "data.frame"
   ```
   
   And since dplyr often returns tibble, it makes sense that executing 
`collect` would result in tibble.
   
   > If data.table wants fread on IPC streams or feather, nanoarrow will 
probably be a better long-term solution (IPC support is in the C library 
although I haven't had time to do and R wrapper yet).
   
   This is definitely great!
   I just wanted to point out that there may be users of data.table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] eitsupi commented on pull request #34825: GH-34775: [R] arrow_table: as.data.frame() sometimes returns a tbl and sometimes a data.frame

Reply via email to