[ 
https://issues.apache.org/jira/browse/ARROW-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Patrick Kyle updated ARROW-6819:
-------------------------------------
    Description: 
I am currently using v0.15.0 of the arrow package, installed from source using 
CRAN. I also have v1.0.4 of the sparklyr package installed. While attempting to 
read in Parquet data with both packages attached, the read_parquet function 
appears to ignore the as_data_frame argument (which defaults to TRUE).

[https://github.com/apache/arrow/blob/3d55122c56a508894823a1b79bca71f519fdd52f/r/R/parquet.R#L35-L47]

I am not certain, but I suspect the issue may be in the way Table__to_dataframe 
coerces Arrow Table objects into tibbles, since this statement appears also to 
produce a tibble (I expected a data.frame to be returned):

{{arrow:::Table__to_dataframe(tab, use_threads=FALSE)}}

 

A reproducible example follows.

 

{{# This does work as expected, returns data.frame}}

{{library(arrow)}}

{{temp <- tempfile()}}
 
{{download.file("https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true";,
 temp)}}

{{read_parquet(temp, as_data_frame=TRUE)}}

{{# This does not work as expected, returns tibble}}

{{library(sparklyr)}}

{{read_parquet(temp, as_data_frame=TRUE)}}{{ }}

  was:
I am currently using v0.15.0 of the arrow package, installed from source using 
CRAN. I also have v1.0.4 of the sparklyr package installed. While attempting to 
read in Parquet data with both packages attached, the read_parquet function 
appears to ignore the as_data_frame argument (which defaults to TRUE).

[https://github.com/apache/arrow/blob/3d55122c56a508894823a1b79bca71f519fdd52f/r/R/parquet.R#L35-L47]

I am not certain, but I suspect the issue may be in the way Table__to_dataframe 
coerces Arrow Table objects into tibbles, since this statement appears also to 
produce a tibble (I expected a data.frame to be returned):

{{arrow:::Table__to_dataframe(tab, use_threads=FALSE)}}

 

A reproducible example follows.

 

{{# This does work as expected, returns data.frame}}

{{library(arrow)}}

{{temp <- tempfile()}}
 
{{download.file("https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true";,
 temp)}}

{{read_parquet(temp, as_data_frame=TRUE)}}

{{# This does not work as expected, returns tibble}}

{{library(sparklyr)}}

{{library(arrow)}}

{{read_parquet(temp, as_data_frame=TRUE)}}{{ }}


> arrow::read_parquet ignores as_data_frame when sparklyr package is attached
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-6819
>                 URL: https://issues.apache.org/jira/browse/ARROW-6819
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 0.15.0
>         Environment: R version 3.6.1 (2019-07-05) on x86_64, darwin15.6.0 
> (Mac OS 10.13.4)
>            Reporter: Ryan Patrick Kyle
>            Priority: Major
>
> I am currently using v0.15.0 of the arrow package, installed from source 
> using CRAN. I also have v1.0.4 of the sparklyr package installed. While 
> attempting to read in Parquet data with both packages attached, the 
> read_parquet function appears to ignore the as_data_frame argument (which 
> defaults to TRUE).
> [https://github.com/apache/arrow/blob/3d55122c56a508894823a1b79bca71f519fdd52f/r/R/parquet.R#L35-L47]
> I am not certain, but I suspect the issue may be in the way 
> Table__to_dataframe coerces Arrow Table objects into tibbles, since this 
> statement appears also to produce a tibble (I expected a data.frame to be 
> returned):
> {{arrow:::Table__to_dataframe(tab, use_threads=FALSE)}}
>  
> A reproducible example follows.
>  
> {{# This does work as expected, returns data.frame}}
> {{library(arrow)}}
> {{temp <- tempfile()}}
>  
> {{download.file("https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true";,
>  temp)}}
> {{read_parquet(temp, as_data_frame=TRUE)}}
> {{# This does not work as expected, returns tibble}}
> {{library(sparklyr)}}
> {{read_parquet(temp, as_data_frame=TRUE)}}{{ }}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to