jonkeane commented on pull request #11369:
URL: https://github.com/apache/arrow/pull/11369#issuecomment-952063078
Ok, I've got a reproducer here (ints alone didn't seem to hang, but strings
did — I haven't tried all nulls or floats yet, but can if that's helpful. The
number of rows is what was in the fanniemae dataset. Having a smaller number
does succeed, but above 10 000 000 or so rows it hangs.
```
> library(arrow, warn.conflicts = FALSE)
See arrow_info() for available features
>
> df <- data.frame(
+ col_letters = sample(LETTERS, 22180168, replace = TRUE)
+ )
>
> write_parquet(df, "df.parquet")
>
> df_rt <- read_parquet("df.parquet")
>
> # the write here hangs
> write_parquet(df_rt, "df_again.parquet")
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]