baynes opened a new issue #8344:
URL: https://github.com/apache/pulsar/issues/8344


   If you look at http://pulsar.apache.org/docs/en/functions-develop/#serde 
under the Python tab it says
   
   "In Python, the default SerDe is identity, meaning that the type is 
serialized as whatever type the producer function returns."
   "You can use the IdentitySerde, which leaves the data unchanged. The 
IdentitySerDe is the default."
   
   This strongly gives the impression that the default `IdentitySerDe1 does not 
change the message in any way. This is not the case -- it will attempt to 
convert incoming bytes to one of float, int, string and only leaves it as bytes 
when they fail. This can result in unexpected conversions (we have had binary 
data unexpectedly converted to string).
   
   It also attempts the reverse on the function result. Fortunately this does 
not result in unexpected behaviour, though does lead to muddled/sloppy 
programming as people are careless with the type of return value.
   
   There are options as how to correct this:
   
   1: Fix the code so the `IdentitySerDe` is just that - it leaves the bytes 
unchanged. One could then have a `Paddington Bear SerDe` (well intentioned and 
helpful but tends to get things wrong) which does what the existing 
`IdentitySerDe` does and also have `FloatSerDe`, `IntSerDe` and `StringSerDe` 
to cover the other cases reliably.
   
   2: Change the documentation on the `IdentitySerDe` to explain what it really 
does and its dangers but leave it as the default. Introduce `FloatSerDe`, 
`IntSerDe`,`StringSerDe` and `BytesSerDe` to cover the cases reliably.
   
   3: Fix the code so the `IdentitySerDe` is just that - it leaves the bytes 
unchanged. Also have `FloatSerDe`, `IntSerDe` and `StringSerDe` to cover the 
other cases reliably. Make `StringSerDe` the default on the guess this is the 
most common use case.
   
   My preference would be option 1, but I suspect the installed code base would 
need option 2.  3 is a sort of compromise.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to