nealrichardson commented on issue #35382: URL: https://github.com/apache/arrow/issues/35382#issuecomment-1530113496
Ah, I bet it works if you remove the `setDT()` step, which is anyway not relevant if you're writing to parquet. `dplyr` caches grouping information in a `groups` attribute, which can be huge (as you see) and is redundant to the data. We remove that when we write the data, but because `setDT()` changes the class of the input, it's no longer a `grouped_df`, so we don't catch it: https://github.com/apache/arrow/blob/main/r/R/metadata.R#L149-L161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
