zbentley opened a new issue, #15786:
URL: https://github.com/apache/pulsar/issues/15786

   **Describe the bug**
   Properties objects on messages can be set to (and published with) values 
that cannot be deserialized on the far side.
   
   **To Reproduce**
   1. Using the Python client, publish a message on any topic with 
`properties={'foo': b'\x01-\x00\x97'}`
   2. Using a Python consumer, consume that message and attempt to access 
`message.properties()`. 
   3. Observe that a `UnicodeDecodeError` is raised.
   4. Repeat steps 1-3 with `properties={ b'\x01-\x00\x97': 'foo'}`
   
   **Expected behavior**
   Properties should be round-trippable: they should be deserialized with the 
same types and values with which they were set, and should not raise exceptions 
on deserialization.
   
   There are three possible solutions here:
   1. Require that all properties keys and values be `bytes`s in Python. This 
is easy to implement inside the client, but breaks backwards compatibility.
   1. Encode type information along with property keys and values. This is 
harder to implement inside the client (it doesn't seem like it's using 
`google.protobuf.Value`s on the wire at the moment, but I may be misreading the 
code) and deserialize the appropriate types in the consumer.
   1. Less preferable: require that all keys and values be `str`s in Python. 
This is more restrictive than the protocol allows, but is probably simpler to 
implement.
   
   **Environment:**
   MacOS 12 x86, Pulsar standalone 2.10, pulsar client 2.10, Python 3.7.13.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to