Hi all,

It looks like UTF-8 String Coder in Java and Python SDKs uses different
encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the
input string before actual data bytes however StrUtf8Coder in Python SDK
directly encodes the input string to bytes value. For the last few weeks,
I've been testing and fixing cross-language IO transforms and this
discrepancy is a major blocker for me. IMO, we should unify the encoding
schemes of UTF8 strings across the different SDKs and make it a standard
coder. Any thoughts?

Thanks,

Reply via email to