[ 
https://issues.apache.org/jira/browse/DAFFODIL-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Beckerle updated DAFFODIL-2561:
------------------------------------
    Priority: Trivial  (was: Minor)

> Fix uses of getBytes without an encoding specified
> --------------------------------------------------
>
>                 Key: DAFFODIL-2561
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2561
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Clean Ups
>            Reporter: Steve Lawrence
>            Priority: Trivial
>
> Comment from [~interran] in a pull request:
> {quote}I reviewed how we call getBytes in Daffodil in order to check for 
> inconsistencies and best practices. I noticed two things: 1) we call 
> getBytes("ascii") instead every other place where we want bytes from ASCII 
> characters; and 2) we call getBytes without a charset name too many times. 
> Java's platform default charset is specific to the user and OS. On many 
> modern Linux systems, it's UTF-8. On Macs, it’s MacRoman. In the US on 
> Windows, it's often CP1250, while in Europe it's CP1252 or in China it's 
> often simplified Chinese (Big5 or a GB*). I'm agnostic whether we use 
> "ascii", "US-ASCII", or import java.nio.charset.StandardCharsets and use 
> StandardCharsets.US_ASCII (I see Daffodil typically uses all-lowercase 
> strings most often to save space and typing), but we probably should create a 
> bug to replace all parameter-less getBytes calls with getBytes("utf-8").
> {quote}
> I *think* most/all of our uses of getBytes that don't provide an encoding are 
> in tests. But even if it doesn't affect the Daffodil source, it does make our 
> tests fragile to a users encoding, and we are not consistent at all. We 
> should fix this so all uses provided an encoding, and our encodings are 
> consistent.
> Additionally, the String class has a constructor and accepts a byte array and 
> an optional encoding. The same issue occurs if one does not provide an 
> encoding. We should find all uses of this constructor and ensure they use an 
> encoding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to