lucasmation commented on issue #40423:
URL: https://github.com/apache/arrow/issues/40423#issuecomment-2407104822

   any updates on this? 
   
   I am having a possibly related problem when trying to save a data.table to a 
parquet file
   
   ```
   library(tidyverse)
   library(data.table)
   library(arrow)  # version 17.0.0.1
   options(arrow.use_dt = TRUE)
   
   # 1m sample
   d[1:(10^6)] %>% write_dataset('C:/test1')
   t1 <- open_dataset('C:/test1') %>% collect() #
   class(t1)
   "data.table" "data.frame"
   
   # 10m sample
   d[1:(10*10^6)] %>% write_dataset('C:/test2')
   class(t2)
   "data.table" "data.frame"
   
   # 20m sample
   d[1:(20*10^6)] %>% write_dataset('C:/test3')
   class(t3)
   "data.table" "data.frame"
   
   #Full data (270m obs)
   d %>% write_dataset('C:/test4')
   Warnings
   1: Invalid metadata$r 
   2: Invalid metadata$r 
   3: Invalid metadata$r
   class(t4)
   "tbl_df"     "tbl"        "data.frame"
   ```
   
   The weird thing is that when I feed smaller samples of the data the parquet 
file is saved without warnings and the "open_dataset > collect" operation 
returns a data.table as expected. 
   
   However, when I feed the full dataset(270m), there are 3 "Invalid 
metadata$r" warnings and the "open_dataset > collect" returns a    " "tbl_df"   
  "tbl"        "data.frame" " object
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to