pjfawcett commented on PR #438: URL: https://github.com/apache/qpid-proton/pull/438#issuecomment-2634465212
I share your indecision. The current usage of this binding by user code is unknown so we can't be certain what a single "right" solution would be. I put in the 'expected exception' in the unit test to highlight the issue with encoding with Latin-1. I decided to go back and dig into the behaviour of version 0.38.0, which uses SWIG instead of CFFI. After picking through the code, including the SWIG generated C wrapper code, I ascertained that: 1. When setting the tag bytes from a Python string, [PyUnicode_AsUTF8AndSize](https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize) is called, which has the same behaviour as `encode("utf-8", errors="strict")` 2. When reading the tag bytes into a Python string, [PyUnicode_DecodeUTF8](https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_DecodeUTF8) is called with `errors="surrogateescape"`, which has the same behaviour as `decode("utf-8", errors="surrogateescape")` This combination allows for round tripping of Unicode strings when going string -> tag -> string and also accepts non-utf8 byte sequences in incoming tags. (Although the resultant strings from such tags would cause exceptions if used as a new tag, since the encoding "fails if the string contains surrogate code points") So, in order to maintain backwards compatibility, and to fix the problem I encountered with some tags causing exceptions when read, I propose to implement the encoding/decoding using the parameters outlined above. Having said that, it is annoying that one can't easily get to the raw byte values for the tags. The current, 0.40.0, implementation allows the setting of the tag form a `bytes` value as so bypasses codecs. I think it would be good to have more direct access to the incoming tag bytes. Could I suggest that a `btag` property is added to the `Delivery` class, alongside the existing `tag` property, to get the tag a `bytes`. I'm happy to do this, and to raise a separate ticket / PR if required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org