[ 
https://issues.apache.org/jira/browse/SUREFIRE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gian Merlino updated SUREFIRE-1865:
-----------------------------------
    Description: 
ChecksumCalculator does the following in getSha1:

{code}
md.update( configValue.getBytes( ISO_8859_1 ), 0, configValue.length() );
{code}

This isn't using the right length, because {{configValue.length()}} is a length 
in characters, not bytes. This will lead to the wrong length being used for any 
strings that contain characters that aren't encoded in a single byte.

Additionally, I believe that this class can be used to compute checksums on 
strings that fall outside the ISO_8859_1 character set, so UTF_8 would be a 
better choice.

I ran into this when defining a test property that contained Cyrillic 
characters and emojis.

  was:
ChecksumCalculator does the following in getSha1:

{code}
md.update( configValue.getBytes( ISO_8859_1 ), 0, configValue.length() );
{code}

This isn't right, because {{configValue.length()}} is a length in characters, 
not bytes. This will lead to the wrong length being used for any strings that 
contain characters that aren't encoded in a single byte.


> ChecksumCalculator getSha1 does not compute checksums correctly
> ---------------------------------------------------------------
>
>                 Key: SUREFIRE-1865
>                 URL: https://issues.apache.org/jira/browse/SUREFIRE-1865
>             Project: Maven Surefire
>          Issue Type: Bug
>            Reporter: Gian Merlino
>            Priority: Major
>
> ChecksumCalculator does the following in getSha1:
> {code}
> md.update( configValue.getBytes( ISO_8859_1 ), 0, configValue.length() );
> {code}
> This isn't using the right length, because {{configValue.length()}} is a 
> length in characters, not bytes. This will lead to the wrong length being 
> used for any strings that contain characters that aren't encoded in a single 
> byte.
> Additionally, I believe that this class can be used to compute checksums on 
> strings that fall outside the ISO_8859_1 character set, so UTF_8 would be a 
> better choice.
> I ran into this when defining a test property that contained Cyrillic 
> characters and emojis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to