[jira] [Commented] (DAFFODIL-2561) Fix uses of getBytes without an encoding specified

Mike Beckerle (Jira) Thu, 23 Sep 2021 05:35:07 -0700


    [ 
https://issues.apache.org/jira/browse/DAFFODIL-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419171#comment-17419171
 ]


Mike Beckerle commented on DAFFODIL-2561:
-----------------------------------------

It would be great if we could put custom rules into our sonarqube scanner to do 
things like find getBytes that doesn't specify encoding, or find replaceAll 
calls that don't use the quoteEscaping, and allow the rule to specify a message 
suggesting the required improvement. 

For things like the quoteEscaping thing, I'd like the rule to suggest calling a 
library function we provide, instead of using replaceAll, since you either need 
to use the quoteEscaping, or there should be an 
Assert.invariant(!x.contains("$")) on the second argument to replaceAll.  

Adding all these things to our "code review" page, and people remembering to 
check for them, is hard for people to do. 

> Fix uses of getBytes without an encoding specified
> --------------------------------------------------
>
>                 Key: DAFFODIL-2561
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2561
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Clean Ups
>            Reporter: Steve Lawrence
>            Priority: Major
>
> Comment from [~interran] in a pull request:
> {quote}I reviewed how we call getBytes in Daffodil in order to check for 
> inconsistencies and best practices. I noticed two things: 1) we call 
> getBytes("ascii") instead every other place where we want bytes from ASCII 
> characters; and 2) we call getBytes without a charset name too many times. 
> Java's platform default charset is specific to the user and OS. On many 
> modern Linux systems, it's UTF-8. On Macs, it’s MacRoman. In the US on 
> Windows, it's often CP1250, while in Europe it's CP1252 or in China it's 
> often simplified Chinese (Big5 or a GB*). I'm agnostic whether we use 
> "ascii", "US-ASCII", or import java.nio.charset.StandardCharsets and use 
> StandardCharsets.US_ASCII (I see Daffodil typically uses all-lowercase 
> strings most often to save space and typing), but we probably should create a 
> bug to replace all parameter-less getBytes calls with getBytes("utf-8").
> {quote}
> I *think* most/all of our uses of getBytes that don't provide an encoding are 
> in tests. But even if it doesn't affect the Daffodil source, it does make our 
> tests fragile to a users encoding, and we are not consistent at all. We 
> should fix this so all uses provided an encoding, and our encodings are 
> consistent.
> Additionally, the String class has a constructor and accepts a byte array and 
> an optional encoding. The same issue occurs if one does not provide an 
> encoding. We should find all uses of this constructor and ensure they use an 
> encoding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DAFFODIL-2561) Fix uses of getBytes without an encoding specified

Reply via email to