[jira] [Commented] (ARROW-13860) [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame

Nic Crane (Jira) Thu, 02 Sep 2021 15:31:36 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409141#comment-17409141
 ]


Nic Crane commented on ARROW-13860:
-----------------------------------

[~icook] I have no idea, but I ran a few things in Arrow 4.1 and make what you 
will of the below, but I think it might answer your question


{code:java}
iris %>% group_by(Species) %>% record_batch() 

RecordBatch
150 rows x 5 columns
$Sepal.Length <double>
$Sepal.Width <double>
$Petal.Length <double>
$Petal.Width <double>
$Species <dictionary<values=string, indices=int8>>

See $metadata for additional Schema metadata

> iris %>% group_by(Species) %>% record_batch() %>% collect()
# A tibble: 150 x 5
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
 *        <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# … with 140 more rows

> iris %>% record_batch() %>% group_by(Species) 
RecordBatch (query)
Sepal.Length: double
Sepal.Width: double
Petal.Length: double
Petal.Width: double
Species: dictionary<values=string, indices=int8>

* Grouped by Species
See $.data for the source Arrow object

> iris %>% record_batch() %>% group_by(Species)  %>% collect()
# A tibble: 150 x 5
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# … with 140 more rows

{code}


> [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame
> ---------------------------------------------------------------------
>
>                 Key: ARROW-13860
>                 URL: https://issues.apache.org/jira/browse/ARROW-13860
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>         Environment: maxOS 11.1 Big Sur
>            Reporter: Hideaki Hayashi
>            Priority: Major
>
> arrow 5.0.0 write_parquet throws error writing grouped data.frame.
> Here is how to reproduce it.
> {{library(dplyr)}}
> {{ arrow::write_parquet(mtcars %>% group_by(am),"/tmp/mtcars_test.parquet")}}
> {{# Error: x must be an object of class 'data.frame', 'RecordBatch', or 
> 'Table', not 'arrow_dplyr_query’.}}
>  
> With arrow 4.0.1, this used to work fine.
> {{library(dplyr)}}
> {{arrow::write_parquet(mtcars %>% group_by(am),"/tmp/mtcars_test.parquet")}}
> {{x <- arrow::read_parquet("/tmp/mtcars_test.parquet")}}
> {{x}}
> {{# A tibble: 32 x 11}}
> {{# Groups:   am [2]}}
> {{#     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb}}
> {{# * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>}}
> {{# 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4}}
> {{# 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4}}
> {{# 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1}}
> {{# 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1}}
> {{# 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2}}
> {{# 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1}}
> {{# 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4}}
> {{# …}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13860) [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame

Reply via email to