[
https://issues.apache.org/jira/browse/ARROW-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962823#comment-16962823
]
Vidar Ingason commented on ARROW-7018:
--------------------------------------
Hi Neal
Here is a small code that will reproduces this issue.
{code:java}
library(tidyverse)
library(arrow)
df <- tibble(a = c("Veitingastaðir"),
b = 10)
write_parquet(df, "test.parquet")
df_read <- read_parquet("test.parquet")
{code}
> Special characters as question mark in parquet files in R
> ---------------------------------------------------------
>
> Key: ARROW-7018
> URL: https://issues.apache.org/jira/browse/ARROW-7018
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 0.15.0
> Environment: I'm running R on Windows 10
> Reporter: Vidar Ingason
> Priority: Major
>
> Hello.
> I'm new to the arrow package in R and I'm having a trouble regarding special
> characters (Icelandic). I have a large data set and everything is fine until
> I write the file to disk and read it in again (i.e. I use write_parquet() and
> then read_parquet()). When I read the data back in to R special characters
> turn into question mark. I.e. Veitingastaðir becomes Veitingasta�ir.
> This does not happen when I use .csv.
> Is there anything I can do when I write the .parquet file to disk or when I
> read it in to prevent this?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)