Steve Lawrence created DAFFODIL-2561:
----------------------------------------
Summary: Fix uses of getBytes without an encoding specified
Key: DAFFODIL-2561
URL: https://issues.apache.org/jira/browse/DAFFODIL-2561
Project: Daffodil
Issue Type: Bug
Components: Clean Ups
Reporter: Steve Lawrence
Comment from [~interran] in a pull request:
{quote}I reviewed how we call getBytes in Daffodil in order to check for
inconsistencies and best practices. I noticed two things: 1) we call
getBytes("ascii") instead every other place where we want bytes from ASCII
characters; and 2) we call getBytes without a charset name too many times.
Java's platform default charset is specific to the user and OS. On many modern
Linux systems, it's UTF-8. On Macs, it’s MacRoman. In the US on Windows, it's
often CP1250, while in Europe it's CP1252 or in China it's often simplified
Chinese (Big5 or a GB*). I'm agnostic whether we use "ascii", "US-ASCII", or
import java.nio.charset.StandardCharsets and use StandardCharsets.US_ASCII (I
see Daffodil typically uses all-lowercase strings most often to save space and
typing), but we probably should create a bug to replace all parameter-less
getBytes calls with getBytes("utf-8").{quote}
I *think* most/all of our uses of getBytes that don't provide an encoding are
in tests. But even if it doesn't affect the Daffodil source, it does make our
tests fragile to a users encoding, and we are not consistent at all. We should
fix this so all uses provided an encoding, and our encodings are consistent.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)