nealrichardson commented on issue #35382:
URL: https://github.com/apache/arrow/issues/35382#issuecomment-1530113496

   Ah, I bet it works if you remove the `setDT()` step, which is anyway not 
relevant if you're writing to parquet. `dplyr` caches grouping information in a 
`groups` attribute, which can be huge (as you see) and is redundant to the 
data. We remove that when we write the data, but because `setDT()` changes the 
class of the input, it's no longer a `grouped_df`, so we don't catch it: 
https://github.com/apache/arrow/blob/main/r/R/metadata.R#L149-L161


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to