romainfrancois commented on pull request #11369:
URL: https://github.com/apache/arrow/pull/11369#issuecomment-952078563


   🤔 I need to have a look at how. `write_parquet()` works. It might also be 
that because string vector don't have `Get_region()`, we are forced to get them 
one element at a time, and so keep paying the same cost over and over again. 
   
   ```r
   library(arrow, warn.conflicts = FALSE)
   #> See arrow_info() for available features
   library(purrr)
   
   df <- data.frame(col_letters = sample(LETTERS, 22180168, replace = TRUE))
   write_parquet(df, "df.parquet")
   
   # materialize all at once
   df_rt <- read_parquet("df.parquet")
   system.time(df_rt$col_letters[])
   #>    user  system elapsed 
   #>   0.466   0.033   0.499
   
   # materialize one at a time
   df_rt <- read_parquet("df.parquet")
   system.time(for(i in seq_along(df_rt$col_letters)) df_rt$col_letters[i])
   #>    user  system elapsed 
   #>  78.459   5.364  84.029
   
   # the write here hangs
   # write_parquet(df_rt, "df_again.parquet")
   ```
   
   I'll investigate. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to