[PR] Fix reading UTF-8 strings (from OPC UA nodes) (plc4x)

via GitHub Tue, 07 Mar 2023 07:56:37 -0800


Planet-X opened a new pull request, #832:
URL: https://github.com/apache/plc4x/pull/832


   When reading string values via the OPC UA protocol a 
`StringIndexOutOfBoundsException` was thrown whenever the string contained 
non-ASCII characters. Upon further inspection, the problem could be pinned down 
to the logic in `readString()` of`ReadBufferByteBased`.
   
   The `readString()` function in `ReadBufferByteBased` could result in a 
`StringIndexOutOfBoundsException` upon calling `substring()`. Reason is that 
the calculated `realLength` is in bytes and the length of the created string is 
the number of characters. For UTF encodings, multiple bytes can be just one 
character. This causes `realLength` to be longer than the actual string and 
thus the `substring()`-call to fail.
   
   This PR fixes the issue by applying `realLength` to the byte-length instead 
of the string length: The byte array is sliced to length `realLength` and then 
converted to the final string. `substring()` isn't used anymore.
   The fix can be verified via the included regression test and has 
additionally been tested locally using the OPC UA protocol.
   Maybe other protocols were also affected?
   
   Additionally I've included some other minor fixes for the logic. Check the 
commits for these.
   
   Regards,
   Marc
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@plc4x.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] Fix reading UTF-8 strings (from OPC UA nodes) (plc4x)

Reply via email to