[ 
https://issues.apache.org/jira/browse/ARROW-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Jones updated ARROW-16085:
-------------------------------
    Summary: [C++][R] InMemoryDataset::ReplaceSchema does not alter scan output 
 (was: [R] Support unifying schemas for InMemoryDatasets)

> [C++][R] InMemoryDataset::ReplaceSchema does not alter scan output
> ------------------------------------------------------------------
>
>                 Key: ARROW-16085
>                 URL: https://issues.apache.org/jira/browse/ARROW-16085
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 7.0.0
>            Reporter: Will Jones
>            Assignee: Will Jones
>            Priority: Major
>             Fix For: 9.0.0
>
>
>  
> The following fails:
> {code:R}
> sub_df1 <- Table$create(
>   x = Array$create(c(1, 2, 3)),
>   y = Array$create(c("a", "b", "c"))
> )
> sub_df2 <- Table$create(
>   x = Array$create(c(4, 5)),
>   z = Array$create(c("d", "e"))
> )
> ds1 <- InMemoryDataset$create(sub_df1)
> ds2 <- InMemoryDataset$create(sub_df2)
> ds <- c(ds1, ds2)
> actual <- ds %>% collect()
> {code}
> {code}
> Type error: yielded batch had schema x: double
> y: string which did not match InMemorySource's: x: double
> y: string
> z: string
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541
>   child_.Next()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152
>   value_.status()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180
>   maybe_element
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840
>   fragments_it.ToVector()
> {code}
> If we fixed this, we could implement a function that does for Tables what 
> {{dplyr::bind_rows}} does for Tibbles:
> {code:R}
> concat_tables <- function(..., schema = NULL) {
>   tables <- list2(...)
>   dataset <- open_dataset(map(tables, InMemoryDataset$create), schema = 
> schema)
>   dplyr::collect(dataset, as_data_frame = FALSE)
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to