[GitHub] [arrow] westonpace commented on a change in pull request #10191: [R] [WIP] Use InMemoryDataset for Table/RecordBatch in dplyr code

GitBox Mon, 03 May 2021 14:14:29 -0700


westonpace commented on a change in pull request #10191:
URL: https://github.com/apache/arrow/pull/10191#discussion_r625372419




##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -344,20 +344,21 @@ test_that("print a mutated table", {
       select(int) %>%
       mutate(twice = int * 2) %>%
       print(),
-'Table (query)
+'InMemoryDataset (query)
 int: int32
 twice: expr
 
 See $.data for the source Arrow object',
   fixed = TRUE)
 
   # Handling non-expressions/edge cases
+  skip("InMemoryDataset$Project() doesn't accept array (or could it?)")
   expect_output(
     Table$create(tbl) %>%
       select(int) %>%
       mutate(again = 1:10) %>%

Review comment:
       None of the examples on 
https://dplyr.tidyverse.org/reference/mutate.html actually use this form and I 
have a hard time understanding why someone might want to do this?
   
   Furthermore, this question 
https://stackoverflow.com/questions/60582562/another-length-error-using-dplyr-mutate-and-if-else
 shows some of the confusion you run into with something like this.
   
   From an SQL perspective the proper way to add in a new column would be to 
join.  This is sort of a "join without a common key" which raises a few 
eyebrows in this question: 
https://stackoverflow.com/questions/1198124/combine-two-tables-that-have-no-common-fields
   
   Also, would the vector be the same length as a single batch?  Or the entire 
table?  If it's the entire table then it's going to force the table to be 
processed in order which is undesirable as well.
   
   I think I'd want to see a valid use case before investing effort.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on a change in pull request #10191: [R] [WIP] Use InMemoryDataset for Table/RecordBatch in dplyr code

Reply via email to