[ 
https://issues.apache.org/jira/browse/SPARK-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8951:
-----------------------------
    Target Version/s:   (was: 1.5.0)
            Priority: Minor  (was: Major)

Please first read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
We don't manage changes via patches, but by pull requests. Also, don't set 
target version.

This won't be specific to CJK right? you are talking about supporting Unicode.
Why introduce a null terminator? just write the length of the byte encoding in 
UTF-8 and send UTF-8 bytes.

> support CJK characters in collect()
> -----------------------------------
>
>                 Key: SPARK-8951
>                 URL: https://issues.apache.org/jira/browse/SPARK-8951
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>            Reporter: Jaehong Choi
>            Priority: Minor
>         Attachments: SerDe.scala.diff
>
>
> Spark gives an error message and does not show the output when a field of the 
> result DataFrame contains characters in CJK.
> I found out that SerDe in R API only supports ASCII format for strings right 
> now as commented in source code.  
> So, I fixed SerDe.scala a little to support CJK as the file attached. 
> I did not care efficiency, but just wanted to see if it works.
> {noformat}
> people.json
> {"name":"가나"}
> {"name":"테스트123", "age":30}
> {"name":"Justin", "age":19}
> df <- read.df(sqlContext, "./people.json", "json")
> head(df)
> Error in rawtochar(string) : embedded nul in string : '\0 \x98'
> {noformat}
> {code:title=core/src/main/scala/org/apache/spark/api/r/SerDe.scala}
>   // NOTE: Only works for ASCII right now
>   def writeString(out: DataOutputStream, value: String): Unit = {
>     val len = value.length
>     out.writeInt(len + 1) // For the \0
>     out.writeBytes(value)
>     out.writeByte(0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to