[ https://issues.apache.org/jira/browse/ARROW-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Patrick Kyle updated ARROW-6819: ------------------------------------- Description: I am currently using v0.15.0 of the arrow package, installed from source using CRAN. I also have v1.0.4 of the sparklyr package installed. While attempting to read in Parquet data with both packages attached, the read_parquet function appears to ignore the as_data_frame argument (which defaults to TRUE). [https://github.com/apache/arrow/blob/3d55122c56a508894823a1b79bca71f519fdd52f/r/R/parquet.R#L35-L47] I am not certain, but I suspect the issue may be in the way Table__to_dataframe coerces Arrow Table objects into tibbles, since this statement appears also to produce a tibble (I expected a data.frame to be returned): {{arrow:::Table__to_dataframe(tab, use_threads=FALSE)}} A reproducible example follows. {{# This does work as expected, returns data.frame}} {{library(arrow)}} {{temp <- tempfile()}} {{download.file("https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true", temp)}} {{read_parquet(temp, as_data_frame=TRUE)}} {{# This does not work as expected, returns tibble}} {{library(sparklyr)}} {{read_parquet(temp, as_data_frame=TRUE)}}{{ }} was: I am currently using v0.15.0 of the arrow package, installed from source using CRAN. I also have v1.0.4 of the sparklyr package installed. While attempting to read in Parquet data with both packages attached, the read_parquet function appears to ignore the as_data_frame argument (which defaults to TRUE). [https://github.com/apache/arrow/blob/3d55122c56a508894823a1b79bca71f519fdd52f/r/R/parquet.R#L35-L47] I am not certain, but I suspect the issue may be in the way Table__to_dataframe coerces Arrow Table objects into tibbles, since this statement appears also to produce a tibble (I expected a data.frame to be returned): {{arrow:::Table__to_dataframe(tab, use_threads=FALSE)}} A reproducible example follows. {{# This does work as expected, returns data.frame}} {{library(arrow)}} {{temp <- tempfile()}} {{download.file("https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true", temp)}} {{read_parquet(temp, as_data_frame=TRUE)}} {{# This does not work as expected, returns tibble}} {{library(sparklyr)}} {{library(arrow)}} {{read_parquet(temp, as_data_frame=TRUE)}}{{ }} > arrow::read_parquet ignores as_data_frame when sparklyr package is attached > --------------------------------------------------------------------------- > > Key: ARROW-6819 > URL: https://issues.apache.org/jira/browse/ARROW-6819 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 0.15.0 > Environment: R version 3.6.1 (2019-07-05) on x86_64, darwin15.6.0 > (Mac OS 10.13.4) > Reporter: Ryan Patrick Kyle > Priority: Major > > I am currently using v0.15.0 of the arrow package, installed from source > using CRAN. I also have v1.0.4 of the sparklyr package installed. While > attempting to read in Parquet data with both packages attached, the > read_parquet function appears to ignore the as_data_frame argument (which > defaults to TRUE). > [https://github.com/apache/arrow/blob/3d55122c56a508894823a1b79bca71f519fdd52f/r/R/parquet.R#L35-L47] > I am not certain, but I suspect the issue may be in the way > Table__to_dataframe coerces Arrow Table objects into tibbles, since this > statement appears also to produce a tibble (I expected a data.frame to be > returned): > {{arrow:::Table__to_dataframe(tab, use_threads=FALSE)}} > > A reproducible example follows. > > {{# This does work as expected, returns data.frame}} > {{library(arrow)}} > {{temp <- tempfile()}} > > {{download.file("https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true", > temp)}} > {{read_parquet(temp, as_data_frame=TRUE)}} > {{# This does not work as expected, returns tibble}} > {{library(sparklyr)}} > {{read_parquet(temp, as_data_frame=TRUE)}}{{ }} -- This message was sent by Atlassian Jira (v8.3.4#803005)