[
https://issues.apache.org/jira/browse/SPARK-30645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-30645.
----------------------------------
Fix Version/s: 3.0.0
2.4.5
Resolution: Fixed
Issue resolved by pull request 27362
[https://github.com/apache/spark/pull/27362]
> collect() support Unicode charactes tests fails on Windows
> ----------------------------------------------------------
>
> Key: SPARK-30645
> URL: https://issues.apache.org/jira/browse/SPARK-30645
> Project: Spark
> Issue Type: Bug
> Components: SparkR, Tests
> Affects Versions: 3.0.0
> Reporter: Maciej Szymkiewicz
> Assignee: Maciej Szymkiewicz
> Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> As-is [test_that("collect() support Unicode
> characters"|https://github.com/apache/spark/blob/d5b92b24c41b047c64a4d89cc4061ebf534f0995/R/pkg/tests/fulltests/test_sparkSQL.R#L850-L869]
> case seems to be system dependent, and doesn't work properly on Windows with
> CP1252 English locale:
>
> {code:r}
> library(SparkR)
> SparkR::sparkR.session()
> Sys.info()
> # sysname release version
> # "Windows" "Server x64" "build 17763"
> # nodename machine login
> # "WIN-5BLT6Q610KH" "x86-64" "Administrator"
> # user effective_user
> # "Administrator" "Administrator"
> Sys.getlocale()
> # [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
> lines <- c("{\"name\":\"안녕하세요\"}",
> "{\"name\":\"您好\", \"age\":30}",
> "{\"name\":\"こんにちは\", \"age\":19}",
> "{\"name\":\"Xin chào\"}")
> system(paste0("cat ", jsonPath))
> # {"name":"<U+C548><U+B155><U+D558><U+C138><U+C694>"}
> # {"name":"<U+60A8><U+597D>", "age":30}
> # {"name":"<U+3053><U+3093><U+306B><U+3061><U+306F>", "age":19}
> # {"name":"Xin chào"}
> # [1] 0
> jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp")
> writeLines(lines, jsonPath)
> df <- read.df(jsonPath, "json")
> printSchema(df)
> # root
> # |-- _corrupt_record: string (nullable = true)
> # |-- age: long (nullable = true)
> # |-- name: string (nullable = true)
> head(df)
> # _corrupt_record age name
> # 1 <NA> NA <U+C548><U+B155><U+D558><U+C138><U+C694>
> # 2 <NA> 30 <U+60A8><U+597D>
> # 3 <NA> 19 <U+3053><U+3093><U+306B><U+3061><U+306F>
> # 4 {"name":"Xin ch<U+FFFD>o"} NA <NA>
> {code}
> Problem becomes visible on AppVoyer when testthat is updated to 2.x, but
> somehow silenced when testthat 1.x is used.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]