[ 
https://issues.apache.org/jira/browse/ARROW-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Jones updated ARROW-16085:
-------------------------------
    Description: 
 

The following fails:
{code:r}
sub_df1 <- Table$create(
  x = Array$create(c(1, 2, 3)),
  y = Array$create(c("a", "b", "c"))
)
sub_df2 <- Table$create(
  x = Array$create(c(4, 5)),
  z = Array$create(c("d", "e"))
)

ds1 <- InMemoryDataset$create(sub_df1)
ds2 <- InMemoryDataset$create(sub_df2)
ds <- c(ds1, ds2)
actual <- ds %>% collect()
{code}
{code:java}
Type error: yielded batch had schema x: double
y: string which did not match InMemorySource's: x: double
y: string
z: string
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541 
 child_.Next()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152 
 value_.status()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180 
 maybe_element
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840
  fragments_it.ToVector()
{code}
 

  was:
 

The following fails:

{code:R}
sub_df1 <- Table$create(
  x = Array$create(c(1, 2, 3)),
  y = Array$create(c("a", "b", "c"))
)
sub_df2 <- Table$create(
  x = Array$create(c(4, 5)),
  z = Array$create(c("d", "e"))
)

ds1 <- InMemoryDataset$create(sub_df1)
ds2 <- InMemoryDataset$create(sub_df2)
ds <- c(ds1, ds2)
actual <- ds %>% collect()
{code}

{code}
Type error: yielded batch had schema x: double
y: string which did not match InMemorySource's: x: double
y: string
z: string
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541 
 child_.Next()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152 
 value_.status()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180 
 maybe_element
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840
  fragments_it.ToVector()
{code}

If we fixed this, we could implement a function that does for Tables what 
{{dplyr::bind_rows}} does for Tibbles:

{code:R}
concat_tables <- function(..., schema = NULL) {
  tables <- list2(...)

  dataset <- open_dataset(map(tables, InMemoryDataset$create), schema = schema)

  dplyr::collect(dataset, as_data_frame = FALSE)
}
{code}
 


> [C++][R] InMemoryDataset::ReplaceSchema does not alter scan output
> ------------------------------------------------------------------
>
>                 Key: ARROW-16085
>                 URL: https://issues.apache.org/jira/browse/ARROW-16085
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>    Affects Versions: 7.0.0
>            Reporter: Will Jones
>            Assignee: Will Jones
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 9.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> The following fails:
> {code:r}
> sub_df1 <- Table$create(
>   x = Array$create(c(1, 2, 3)),
>   y = Array$create(c("a", "b", "c"))
> )
> sub_df2 <- Table$create(
>   x = Array$create(c(4, 5)),
>   z = Array$create(c("d", "e"))
> )
> ds1 <- InMemoryDataset$create(sub_df1)
> ds2 <- InMemoryDataset$create(sub_df2)
> ds <- c(ds1, ds2)
> actual <- ds %>% collect()
> {code}
> {code:java}
> Type error: yielded batch had schema x: double
> y: string which did not match InMemorySource's: x: double
> y: string
> z: string
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541
>   child_.Next()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152
>   value_.status()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180
>   maybe_element
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840
>   fragments_it.ToVector()
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to