[GitHub] [arrow] thisisnic commented on a change in pull request #12073: ARROW-14919: [R] write_parquet() drops attributes for grouped dataframes

GitBox Thu, 06 Jan 2022 07:56:50 -0800


thisisnic commented on a change in pull request #12073:
URL: https://github.com/apache/arrow/pull/12073#discussion_r779651711




##########
File path: r/R/metadata.R
##########
@@ -133,24 +133,24 @@ remove_attributes <- function(x) {
 }
 
 arrow_attributes <- function(x, only_top_level = FALSE) {
+
+  att <- attributes(x)
+  removed_attributes <- remove_attributes(x)
+
   if (inherits(x, "grouped_df")) {
     # Keep only the group var names, not the rest of the cached data that dplyr
     # uses, which may be large
     if (requireNamespace("dplyr", quietly = TRUE)) {
       gv <- dplyr::group_vars(x)
       x <- dplyr::ungroup(x)
       # ungroup() first, then set attribute, bc ungroup() would erase it
-      attr(x, ".group_vars") <- gv
-    } else {
-      # Regardless, we shouldn't keep groups around
-      attr(x, "groups") <- NULL
+      att[[".group_vars"]] <- gv
+      removed_attributes <- c(removed_attributes, "groups", "class")

Review comment:
       When `remove_attributes()` is called on `x` it hits this block of code:
   `
   if (identical(class(x), c("tbl_df", "tbl", "data.frame"))) {
       removed_attributes <- c("class", "row.names", "names")
     } else if (inherits(x, "data.frame")) {
       removed_attributes <- c("row.names", "names")
   `
   As it's a `grouped_df` it fails the first condition and hits the second.  
That doesn't include "class" for reasons I don't fully understand but its 
addition makes other tests fail and changes the returned object type.  I've 
tried a few other approaches but they make other tests fail, and I'm going down 
a bit of a rabbit hole.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] thisisnic commented on a change in pull request #12073: ARROW-14919: [R] write_parquet() drops attributes for grouped dataframes

Reply via email to