Itai Frenkel created STORM-437:
----------------------------------
Summary: multilang JsonSerializer does not enforce inputstream
UTF-8 encoding
Key: STORM-437
URL: https://issues.apache.org/jira/browse/STORM-437
Project: Apache Storm (Incubating)
Issue Type: Bug
Affects Versions: 0.9.2-incubating
Environment: AWS ubuntu 12.04 oracle java7
Reporter: Itai Frenkel
On some machines UTF-8 gets corrupted over the multilang protocol. Analysis of
the problem leads to JsonSerializer usage of InputStreamReader when reading
from stdin.
InputStreamReader uses the JVM defaults, which is usually UTF-8 but not always.
Temporary Workaround:
Edit storm/conf/storm.yaml and enforce the default JVM charset as follows:
worker.childopts: "-Xmx768m -Dfile.encoding=UTF-8"
Required Fix in JsonSerializer:
Pass the string "UTF-8" to the InputStreamReader constructor as second argument.
Notes:
The implementation already enforces UTF-8 when writing to stdout, so there is
no other fix needed there.
python simplejson and ruby json gem use UTF-8 as the default.
--
This message was sent by Atlassian JIRA
(v6.2#6252)