romainfrancois commented on pull request #11369:
URL: https://github.com/apache/arrow/pull/11369#issuecomment-952078563
🤔 I need to have a look at how. `write_parquet()` works. It might also be
that because string vector don't have `Get_region()`, we are forced to get them
one element at a time, and so keep paying the same cost over and over again.
```r
library(arrow, warn.conflicts = FALSE)
#> See arrow_info() for available features
library(purrr)
df <- data.frame(col_letters = sample(LETTERS, 22180168, replace = TRUE))
write_parquet(df, "df.parquet")
# materialize all at once
df_rt <- read_parquet("df.parquet")
system.time(df_rt$col_letters[])
#> user system elapsed
#> 0.466 0.033 0.499
# materialize one at a time
df_rt <- read_parquet("df.parquet")
system.time(for(i in seq_along(df_rt$col_letters)) df_rt$col_letters[i])
#> user system elapsed
#> 78.459 5.364 84.029
# the write here hangs
# write_parquet(df_rt, "df_again.parquet")
```
I'll investigate.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]