jonkeane commented on pull request #11369:
URL: https://github.com/apache/arrow/pull/11369#issuecomment-952072917


   Hmmm, actually one of those just finished. Maybe this is “just” paying the 
performance of reading-from-disk + arrow to R converstion all at once on the 
write, but I’m surprised the second write here is _so much_ longer than the 
first: 
   
   ```
   > df <- data.frame(
   +   col_letters = sample(LETTERS, 10000000, replace = TRUE)
   + )
   > 
   > system.time({
   +   write_parquet(df, "df.parquet")
   + })
      user  system elapsed 
     0.633   0.042   0.681 
   > 
   > df_rt <- read_parquet("df.parquet")
   > 
   > system.time({ 
   +   write_parquet(df_rt, "df_again.parquet")
   + })
      user  system elapsed 
    94.758  17.734 114.312 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to