hoshinojyunn commented on code in PR #64794:
URL: https://github.com/apache/doris/pull/64794#discussion_r3491321928
##########
fe/fe-core/src/main/java/org/apache/doris/analysis/InvertedIndexUtil.java:
##########
@@ -134,15 +134,43 @@ public static void checkInvertedIndexParser(String
indexColName, PrimitiveType c
}
}
- private static boolean isSingleByte(String str) {
+ private static boolean isAscii(String str) {
for (int i = 0; i < str.length(); i++) {
- if (str.charAt(i) > 0xFF) {
+ if (str.charAt(i) > 0x7F) {
return false;
}
}
return true;
}
+ public static void checkCharFilterProperties(Map<String, String>
properties) throws AnalysisException {
Review Comment:
Fixed.
`CharReplaceCharFilterValidator` now maps policy `pattern`/`replacement` to
the shared legacy char-filter property keys and reuses
`InvertedIndexUtil.checkCharFilterProperties()`, so the
custom-analyzer/index-policy path now enforces the same ASCII-only replacement
rule as table properties and `TOKENIZE()`.
Added FE coverage in `PolicyValidatorTests` for non-ASCII replacement and a
custom-analyzer regression case in `test_custom_analyzer1.groovy`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]