hoshinojyunn commented on code in PR #64794:
URL: https://github.com/apache/doris/pull/64794#discussion_r3491321928
##########
fe/fe-core/src/main/java/org/apache/doris/analysis/InvertedIndexUtil.java:
##########
@@ -134,15 +134,43 @@ public static void checkInvertedIndexParser(String
indexColName, PrimitiveType c
}
}
- private static boolean isSingleByte(String str) {
+ private static boolean isAscii(String str) {
for (int i = 0; i < str.length(); i++) {
- if (str.charAt(i) > 0xFF) {
+ if (str.charAt(i) > 0x7F) {
return false;
}
}
return true;
}
+ public static void checkCharFilterProperties(Map<String, String>
properties) throws AnalysisException {
Review Comment:
Fixed in d5f02872309. `CharReplaceCharFilterValidator` now maps policy
`pattern`/`replacement` to the shared legacy char-filter property keys and
reuses `InvertedIndexUtil.checkCharFilterProperties()`, so the
custom-analyzer/index-policy path now enforces the same ASCII-only replacement
rule as table properties and `TOKENIZE()`. I also added FE coverage in
`PolicyValidatorTests` for non-ASCII replacement and a regression case in
`regression-test/suites/inverted_index_p0/analyzer/test_char_filter_policy_validation.groovy`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]