ziyanTOP commented on PR #4413:
URL: https://github.com/apache/flink-cdc/pull/4413#issuecomment-4571114699

   @lvyanquan Thanks for the review and the great suggestion.
   
   I agree that automatically detecting the collation is the ideal long-term 
experience. However, there are a few practical constraints that make a 
hard-coded auto-detection difficult in this PR:
   
   1. **Scope mismatch**: `SHOW VARIABLES LIKE 'collation_server'` only returns 
the **server-level** default. In MySQL, collation can be overridden at the 
database, table, and even **column** level. The chunk-split logic actually 
needs the collation of the specific chunk-key column, not the server default.
   
   2. **Multi-table overhead**: A Pipeline job often captures dozens of tables. 
To auto-detect correctly, we would need to query `information_schema.COLUMNS` 
for every table during snapshot initialization, map each MySQL collation name 
to a Java comparison strategy, and handle mixed collations for composite 
primary keys. This adds non-trivial startup latency and state complexity.
   
   3. **User override**: Some users may want to force a specific comparison 
semantics regardless of the MySQL collation (e.g., for performance tuning or 
cross-version compatibility).
   
   Therefore, the current explicit configuration is the safest and most 
backward-compatible fix. That said, I think adding an `auto` mode is a valuable 
**follow-up enhancement** — we can automatically detect each table's chunk-key 
column collation at snapshot time, persist the per-table compare mode in the 
split state, and fall back to `default` when detection fails. I'd be happy to 
create a separate Jira ticket and PR for that.
   
   Please let me know if the current approach looks acceptable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to