cs17899219 opened a new pull request, #621:
URL: https://github.com/apache/doris-flink-connector/pull/621

   # Proposed changes
   
   Issue Number: close #620
   
   ## Problem Summary:
   
   The current logic in `TypeConverter.java` uses a multiplier of `3` to 
calculate the required byte length for the Doris `VARCHAR` type:
   
   ```java
   // Current implementation
   return length * 3 > 65533
           ? DorisType.STRING
           : String.format("%s(%s)", DorisType.VARCHAR, length * 3);
   ```
   
   This assumes a maximum of 3 bytes per character, which is insufficient for 
the widely used utf8mb4 character set (common in MySQL/MariaDB and other 
sources). The utf8mb4 encoding supports the full range of Unicode characters 
(including emojis), requiring up to 4 bytes per character.
   
   If a source column contains 4-byte characters, the calculated byte length 
may underestimate the required size, leading to:
   
   Data truncation or corruption during the synchronization process.
   
   Load failures with errors such as "data length exceeded" or "row size too 
large" when Doris enforces the byte limit.
   
   ## Proposed Solution
   This change updates the byte multiplier from 3 to 4 to safely accommodate 
the full utf8mb4 character set, ensuring the calculated byte length is always 
sufficient for the defined character length, thus guaranteeing data integrity 
and preventing sync failures.
   
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: No
   2. Has unit tests been added: No
   3. Has document been added or modified: No
   4. Does it need to update dependencies: No
   5. Are there any changes that cannot be rolled back: No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to