[
https://issues.apache.org/jira/browse/ARROW-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Will Jones updated ARROW-16085:
-------------------------------
Description:
The following fails:
{code:r}
sub_df1 <- Table$create(
x = Array$create(c(1, 2, 3)),
y = Array$create(c("a", "b", "c"))
)
sub_df2 <- Table$create(
x = Array$create(c(4, 5)),
z = Array$create(c("d", "e"))
)
ds1 <- InMemoryDataset$create(sub_df1)
ds2 <- InMemoryDataset$create(sub_df2)
ds <- c(ds1, ds2)
actual <- ds %>% collect()
{code}
{code:java}
Type error: yielded batch had schema x: double
y: string which did not match InMemorySource's: x: double
y: string
z: string
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541
child_.Next()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152
value_.status()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180
maybe_element
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840
fragments_it.ToVector()
{code}
was:
The following fails:
{code:R}
sub_df1 <- Table$create(
x = Array$create(c(1, 2, 3)),
y = Array$create(c("a", "b", "c"))
)
sub_df2 <- Table$create(
x = Array$create(c(4, 5)),
z = Array$create(c("d", "e"))
)
ds1 <- InMemoryDataset$create(sub_df1)
ds2 <- InMemoryDataset$create(sub_df2)
ds <- c(ds1, ds2)
actual <- ds %>% collect()
{code}
{code}
Type error: yielded batch had schema x: double
y: string which did not match InMemorySource's: x: double
y: string
z: string
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541
child_.Next()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152
value_.status()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180
maybe_element
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840
fragments_it.ToVector()
{code}
If we fixed this, we could implement a function that does for Tables what
{{dplyr::bind_rows}} does for Tibbles:
{code:R}
concat_tables <- function(..., schema = NULL) {
tables <- list2(...)
dataset <- open_dataset(map(tables, InMemoryDataset$create), schema = schema)
dplyr::collect(dataset, as_data_frame = FALSE)
}
{code}
> [C++][R] InMemoryDataset::ReplaceSchema does not alter scan output
> ------------------------------------------------------------------
>
> Key: ARROW-16085
> URL: https://issues.apache.org/jira/browse/ARROW-16085
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, R
> Affects Versions: 7.0.0
> Reporter: Will Jones
> Assignee: Will Jones
> Priority: Major
> Labels: pull-request-available
> Fix For: 9.0.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
>
> The following fails:
> {code:r}
> sub_df1 <- Table$create(
> x = Array$create(c(1, 2, 3)),
> y = Array$create(c("a", "b", "c"))
> )
> sub_df2 <- Table$create(
> x = Array$create(c(4, 5)),
> z = Array$create(c("d", "e"))
> )
> ds1 <- InMemoryDataset$create(sub_df1)
> ds2 <- InMemoryDataset$create(sub_df2)
> ds <- c(ds1, ds2)
> actual <- ds %>% collect()
> {code}
> {code:java}
> Type error: yielded batch had schema x: double
> y: string which did not match InMemorySource's: x: double
> y: string
> z: string
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541
> child_.Next()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152
> value_.status()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180
> maybe_element
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840
> fragments_it.ToVector()
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)