pjfawcett commented on PR #438:
URL: https://github.com/apache/qpid-proton/pull/438#issuecomment-2634465212

   I share your indecision.  The current usage of this binding by user code is 
unknown so we can't be certain what a single "right" solution would be.  I put 
in the 'expected exception' in the unit test to highlight the issue with 
encoding with Latin-1.
   
   I decided to go back and dig into the behaviour of version 0.38.0, which 
uses SWIG instead of CFFI.
   
   After picking through the code, including the SWIG generated C wrapper code, 
I ascertained that:
   
   1. When setting the tag bytes from a Python string, 
[PyUnicode_AsUTF8AndSize](https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize)
 is called, which has the same behaviour as `encode("utf-8", errors="strict")`
   2. When reading the tag bytes into a Python string, 
[PyUnicode_DecodeUTF8](https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_DecodeUTF8)
 is called with `errors="surrogateescape"`, which has the same behaviour as 
`decode("utf-8", errors="surrogateescape")`
   
   This combination allows for round tripping of Unicode strings when going 
string -> tag -> string and also accepts non-utf8 byte sequences in incoming 
tags.  (Although the resultant strings from such tags would cause exceptions if 
used as a new tag, since the  
    encoding "fails if the string contains surrogate code points")
   
   So, in order to maintain backwards compatibility, and to fix the problem I 
encountered with some tags causing exceptions when read, I propose to implement 
the encoding/decoding using the parameters outlined above.
   
   Having said that, it is annoying that one can't easily get to the raw byte 
values for the tags.  The current, 0.40.0, implementation allows the setting of 
the tag form a `bytes` value as so bypasses codecs.  I think it would be good 
to have more direct access to the incoming tag bytes.
   
   Could I suggest that a `btag` property is added to the `Delivery` class, 
alongside the existing `tag` property, to get the tag a `bytes`.
   
   I'm happy to do this, and to raise a separate ticket / PR if required.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to