cs17899219 opened a new pull request, #621:
URL: https://github.com/apache/doris-flink-connector/pull/621
# Proposed changes
Issue Number: close #620
## Problem Summary:
The current logic in `TypeConverter.java` uses a multiplier of `3` to
calculate the required byte length for the Doris `VARCHAR` type:
```java
// Current implementation
return length * 3 > 65533
? DorisType.STRING
: String.format("%s(%s)", DorisType.VARCHAR, length * 3);
```
This assumes a maximum of 3 bytes per character, which is insufficient for
the widely used utf8mb4 character set (common in MySQL/MariaDB and other
sources). The utf8mb4 encoding supports the full range of Unicode characters
(including emojis), requiring up to 4 bytes per character.
If a source column contains 4-byte characters, the calculated byte length
may underestimate the required size, leading to:
Data truncation or corruption during the synchronization process.
Load failures with errors such as "data length exceeded" or "row size too
large" when Doris enforces the byte limit.
## Proposed Solution
This change updates the byte multiplier from 3 to 4 to safely accommodate
the full utf8mb4 character set, ensuring the calculated byte length is always
sufficient for the defined character length, thus guaranteeing data integrity
and preventing sync failures.
## Checklist(Required)
1. Does it affect the original behavior: No
2. Has unit tests been added: No
3. Has document been added or modified: No
4. Does it need to update dependencies: No
5. Are there any changes that cannot be rolled back: No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]