[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

GitBox Mon, 10 May 2021 13:20:40 -0700


ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629647107




##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) 
else .x

Review comment:
       Looking at this a bit more, I think it might require some challenging 
C++ coding to implement `combine_chunks()` like I described. Perhaps it should 
be dealt with in a separate Jira & separate PR.
   
   In the meantime, you could avoid calling `as.vector()` by handling 
`ChunkedArray` explicitly here, taking the first chunk before passing it to 
`Scalar$create()`:
   
   ```r
   if (inherits(.x, "ChunkedArray")) {
     .x <- .x$chunk(0)
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Reply via email to