[
https://issues.apache.org/jira/browse/ARROW-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson resolved ARROW-7639.
------------------------------------
Resolution: Fixed
Issue resolved by pull request 6258
[https://github.com/apache/arrow/pull/6258]
> [R] Cannot convert Dictionary Array to R when values aren't strings
> -------------------------------------------------------------------
>
> Key: ARROW-7639
> URL: https://issues.apache.org/jira/browse/ARROW-7639
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 0.15.1
> Environment: Ubuntu 16.04.5 LTS
> Reporter: Etienne Racine
> Assignee: Neal Richardson
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.16.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> I got an error in R when reading a feather file using arrow::read_feather()
> prepared in python.
> {code:r}
> #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
> #' Cannot convert Dictionary Array of type `dictionary<values=double,
> indices=int8, ordered=0>` to R{code}
> I could reproduce the issue with a minimal example:
> In python:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"float": [0.1, .2, 0.5, .001]})
> df["category"] = df["float"].astype('category')
> df.dtypes
> #' float float64
> #' A object
> #' category category
> #' dtype: object
> df.to_feather("series.feather")
> pa.__version__
> #' '0.15.1'
> {code}
> From R:
> {code:r}
> arrow::read_feather("series.feather")
> #' Error in Table__to_dataframe(x, use_threads = option_use_threads()) :
> #' Cannot convert Dictionary Array of type `dictionary<values=double,
> indices=int8, ordered=0>` to R
> #' Backtrace:
> #' █
> #' 1. └─arrow::read_feather("series.feather")
> #' 2. ├─[ base::as.data.frame(...) ]
> #' 3. └─arrow:::as.data.frame.Table(out)
> #' 4. └─arrow:::Table__to_dataframe(x, use_threads = option_use_threads())
> {code}
> The feather file is read correctly back in python
> {code:python}
> ft = pd.read_feather("series.feather")
> ft.dtypes
> #' float float64
> #' A object
> #' category category
> #' dtype: object
> {code}
> {code:r}
> sessionInfo()
> #' R version 3.5.1 (2018-07-02)
> #' Platform: x86_64-conda_cos6-linux-gnu (64-bit)
> #' Running under: Ubuntu 16.04.5 LTS
> #'
> #' Matrix products: default
> #' BLAS/LAPACK: /misc/DLshare/home/etbellem/miniconda3/lib/R/lib/libRblas.so
> #'
> #' locale:
> #' [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> #' [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> #' [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> #' [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> #' [9] LC_ADDRESS=C LC_TELEPHONE=C
> #' [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> #'
> #' attached base packages:
> #' [1] stats graphics grDevices utils datasets methods base
> #'
> #' loaded via a namespace (and not attached):
> #' [1] Rcpp_1.0.3 arrow_0.15.1 crayon_1.3.4 assertthat_0.2.1
> #' [5] R6_2.4.1 magrittr_1.5 rlang_0.4.2 rstudioapi_0.10
> #' [9] bit64_0.9-7 glue_1.3.1 purrr_0.3.3 bit_1.1-15.1
> #' [13] compiler_3.5.1 tidyselect_0.2.5{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)