Itai Frenkel created STORM-437:
----------------------------------

             Summary: multilang JsonSerializer does not enforce inputstream 
UTF-8 encoding
                 Key: STORM-437
                 URL: https://issues.apache.org/jira/browse/STORM-437
             Project: Apache Storm (Incubating)
          Issue Type: Bug
    Affects Versions: 0.9.2-incubating
         Environment: AWS ubuntu 12.04 oracle java7
            Reporter: Itai Frenkel


On some machines UTF-8 gets corrupted over the multilang protocol. Analysis of 
the problem leads to JsonSerializer usage of InputStreamReader when reading 
from stdin.
InputStreamReader uses the JVM defaults, which is usually UTF-8 but not always. 

Temporary Workaround:
Edit storm/conf/storm.yaml and enforce the default JVM charset as follows:
worker.childopts: "-Xmx768m -Dfile.encoding=UTF-8"

Required Fix in JsonSerializer:
Pass the string "UTF-8" to the InputStreamReader constructor as second argument.

Notes: 
The implementation already enforces UTF-8 when writing to stdout, so there is 
no other fix needed there.
python simplejson and ruby json gem use  UTF-8 as the default.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to