[
https://issues.apache.org/jira/browse/ARROW-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409141#comment-17409141
]
Nic Crane commented on ARROW-13860:
-----------------------------------
[~icook] I have no idea, but I ran a few things in Arrow 4.1 and make what you
will of the below, but I think it might answer your question
{code:java}
iris %>% group_by(Species) %>% record_batch()
RecordBatch
150 rows x 5 columns
$Sepal.Length <double>
$Sepal.Width <double>
$Petal.Length <double>
$Petal.Width <double>
$Species <dictionary<values=string, indices=int8>>
See $metadata for additional Schema metadata
> iris %>% group_by(Species) %>% record_batch() %>% collect()
# A tibble: 150 x 5
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
* <dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# … with 140 more rows
> iris %>% record_batch() %>% group_by(Species)
RecordBatch (query)
Sepal.Length: double
Sepal.Width: double
Petal.Length: double
Petal.Width: double
Species: dictionary<values=string, indices=int8>
* Grouped by Species
See $.data for the source Arrow object
> iris %>% record_batch() %>% group_by(Species) %>% collect()
# A tibble: 150 x 5
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# … with 140 more rows
{code}
> [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame
> ---------------------------------------------------------------------
>
> Key: ARROW-13860
> URL: https://issues.apache.org/jira/browse/ARROW-13860
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Environment: maxOS 11.1 Big Sur
> Reporter: Hideaki Hayashi
> Priority: Major
>
> arrow 5.0.0 write_parquet throws error writing grouped data.frame.
> Here is how to reproduce it.
> {{library(dplyr)}}
> {{ arrow::write_parquet(mtcars %>% group_by(am),"/tmp/mtcars_test.parquet")}}
> {{# Error: x must be an object of class 'data.frame', 'RecordBatch', or
> 'Table', not 'arrow_dplyr_query’.}}
>
> With arrow 4.0.1, this used to work fine.
> {{library(dplyr)}}
> {{arrow::write_parquet(mtcars %>% group_by(am),"/tmp/mtcars_test.parquet")}}
> {{x <- arrow::read_parquet("/tmp/mtcars_test.parquet")}}
> {{x}}
> {{# A tibble: 32 x 11}}
> {{# Groups: am [2]}}
> {{# mpg cyl disp hp drat wt qsec vs am gear carb}}
> {{# * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>}}
> {{# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4}}
> {{# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4}}
> {{# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1}}
> {{# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1}}
> {{# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2}}
> {{# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1}}
> {{# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4}}
> {{# …}}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)