[ https://issues.apache.org/jira/browse/SPARK-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731140#comment-14731140 ]
Jihong MA commented on SPARK-8951: ---------------------------------- This commit cause R style check failure. ======================================================================== Running R style checks ======================================================================== Loading required package: methods Attaching package: 'SparkR' The following objects are masked from 'package:stats': filter, na.omit The following objects are masked from 'package:base': intersect, rbind, sample, subset, summary, table, transform Attaching package: 'testthat' The following object is masked from 'package:SparkR': describe R/deserialize.R:63:9: style: Trailing whitespace is superfluous. string ^ lintr checks failed. [error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/lint-r ; received return code 1 Archiving unit tests logs... > No log files found. Attempting to post to Github... > Post successful. Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results ERROR: Publisher 'Publish JUnit test result report' failed: No test report files were found. Configuration error? Test FAILed. Refer to this link for build results (access rights to CI server needed): > support CJK characters in collect() > ----------------------------------- > > Key: SPARK-8951 > URL: https://issues.apache.org/jira/browse/SPARK-8951 > Project: Spark > Issue Type: Bug > Components: SparkR > Reporter: Jaehong Choi > Assignee: Jaehong Choi > Priority: Minor > Fix For: 1.6.0 > > Attachments: SerDe.scala.diff > > > Spark gives an error message and does not show the output when a field of the > result DataFrame contains characters in CJK. > I found out that SerDe in R API only supports ASCII format for strings right > now as commented in source code. > So, I fixed SerDe.scala a little to support CJK as the file attached. > I did not care efficiency, but just wanted to see if it works. > {noformat} > people.json > {"name":"가나"} > {"name":"테스트123", "age":30} > {"name":"Justin", "age":19} > df <- read.df(sqlContext, "./people.json", "json") > head(df) > Error in rawtochar(string) : embedded nul in string : '\0 \x98' > {noformat} > {code:title=core/src/main/scala/org/apache/spark/api/r/SerDe.scala} > // NOTE: Only works for ASCII right now > def writeString(out: DataOutputStream, value: String): Unit = { > val len = value.length > out.writeInt(len + 1) // For the \0 > out.writeBytes(value) > out.writeByte(0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org