cs17899219 opened a new issue, #620:
URL: https://github.com/apache/doris-flink-connector/issues/620

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   The current logic in `TypeConverter.java` uses a multiplier of `3` to 
calculate the required byte length for the Doris `VARCHAR` type:
   
   ```java
   // Current implementation
   return length * 3 > 65533
           ? DorisType.STRING
           : String.format("%s(%s)", DorisType.VARCHAR, length * 3);
   ```
   
   This assumes a maximum of 3 bytes per character, which is insufficient for 
the widely used utf8mb4 character set (common in MySQL/MariaDB and other 
sources). The utf8mb4 encoding supports the full range of Unicode characters 
(including emojis), requiring up to 4 bytes per character.
   
   If a source column contains 4-byte characters, the calculated byte length 
may underestimate the required size, leading to:
   
   Data truncation or corruption during the synchronization process.
   
   Load failures with errors such as "data length exceeded" or "row size too 
large" when Doris enforces the byte limit.
   
   
   ### Solution
   
   This change updates the byte multiplier from 3 to 4 to safely accommodate 
the full utf8mb4 character set, ensuring the calculated byte length is always 
sufficient for the defined character length, thus guaranteeing data integrity 
and preventing sync failures.
   
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to