[ https://issues.apache.org/jira/browse/SPARK-30645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-30645: ------------------------------------ Assignee: Maciej Szymkiewicz > collect() support Unicode charactes tests fails on Windows > ---------------------------------------------------------- > > Key: SPARK-30645 > URL: https://issues.apache.org/jira/browse/SPARK-30645 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests > Affects Versions: 3.0.0 > Reporter: Maciej Szymkiewicz > Assignee: Maciej Szymkiewicz > Priority: Major > > As-is [test_that("collect() support Unicode > characters"|https://github.com/apache/spark/blob/d5b92b24c41b047c64a4d89cc4061ebf534f0995/R/pkg/tests/fulltests/test_sparkSQL.R#L850-L869] > case seems to be system dependent, and doesn't work properly on Windows with > CP1252 English locale: > > {code:r} > library(SparkR) > SparkR::sparkR.session() > Sys.info() > # sysname release version > # "Windows" "Server x64" "build 17763" > # nodename machine login > # "WIN-5BLT6Q610KH" "x86-64" "Administrator" > # user effective_user > # "Administrator" "Administrator" > Sys.getlocale() > # [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" > lines <- c("{\"name\":\"안녕하세요\"}", > "{\"name\":\"您好\", \"age\":30}", > "{\"name\":\"こんにちは\", \"age\":19}", > "{\"name\":\"Xin chào\"}") > system(paste0("cat ", jsonPath)) > # {"name":"<U+C548><U+B155><U+D558><U+C138><U+C694>"} > # {"name":"<U+60A8><U+597D>", "age":30} > # {"name":"<U+3053><U+3093><U+306B><U+3061><U+306F>", "age":19} > # {"name":"Xin chào"} > # [1] 0 > jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp") > writeLines(lines, jsonPath) > df <- read.df(jsonPath, "json") > printSchema(df) > # root > # |-- _corrupt_record: string (nullable = true) > # |-- age: long (nullable = true) > # |-- name: string (nullable = true) > head(df) > # _corrupt_record age name > # 1 <NA> NA <U+C548><U+B155><U+D558><U+C138><U+C694> > # 2 <NA> 30 <U+60A8><U+597D> > # 3 <NA> 19 <U+3053><U+3093><U+306B><U+3061><U+306F> > # 4 {"name":"Xin ch<U+FFFD>o"} NA <NA> > {code} > Problem becomes visible on AppVoyer when testthat is updated to 2.x, but > somehow silenced when testthat 1.x is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org