Steve Lawrence created DAFFODIL-2561:
----------------------------------------

             Summary: Fix uses of getBytes without an encoding specified
                 Key: DAFFODIL-2561
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2561
             Project: Daffodil
          Issue Type: Bug
          Components: Clean Ups
            Reporter: Steve Lawrence


Comment from [~interran] in a pull request:
{quote}I reviewed how we call getBytes in Daffodil in order to check for 
inconsistencies and best practices. I noticed two things: 1) we call 
getBytes("ascii") instead every other place where we want bytes from ASCII 
characters; and 2) we call getBytes without a charset name too many times. 
Java's platform default charset is specific to the user and OS. On many modern 
Linux systems, it's UTF-8. On Macs, it’s MacRoman. In the US on Windows, it's 
often CP1250, while in Europe it's CP1252 or in China it's often simplified 
Chinese (Big5 or a GB*). I'm agnostic whether we use "ascii", "US-ASCII", or 
import java.nio.charset.StandardCharsets and use StandardCharsets.US_ASCII (I 
see Daffodil typically uses all-lowercase strings most often to save space and 
typing), but we probably should create a bug to replace all parameter-less 
getBytes calls with getBytes("utf-8").{quote}

I *think* most/all of our uses of getBytes that don't provide an encoding are 
in tests. But even if it doesn't affect the Daffodil source, it does make our 
tests fragile to a users encoding, and we are not consistent at all. We should 
fix this so all uses provided an encoding, and our encodings are consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to