github-actions[bot] commented on code in PR #64794:
URL: https://github.com/apache/doris/pull/64794#discussion_r3502395460
##########
fe/fe-core/src/main/java/org/apache/doris/indexpolicy/CharReplaceCharFilterValidator.java:
##########
@@ -39,25 +42,20 @@ protected String getTypeName() {
@Override
protected void validateSpecific(Map<String, String> props) throws
DdlException {
+ Map<String, String> charFilterProperties = new HashMap<>();
+
charFilterProperties.put(InvertedIndexUtil.INVERTED_INDEX_PARSER_CHAR_FILTER_TYPE,
+ InvertedIndexUtil.INVERTED_INDEX_CHAR_FILTER_CHAR_REPLACE);
if (props.containsKey("pattern")) {
- String pattern = props.get("pattern");
- if (pattern != null && !pattern.isEmpty()) {
- for (int i = 0; i < pattern.length(); i++) {
- if (pattern.charAt(i) > 255) {
- throw new DdlException(
- "pattern must contain only single-byte
characters in [0,255]");
- }
- }
- }
+
charFilterProperties.put(InvertedIndexUtil.INVERTED_INDEX_PARSER_CHAR_FILTER_PATTERN,
props.get("pattern"));
}
if (props.containsKey("replacement")) {
- String replacement = props.get("replacement");
- if (replacement == null || replacement.length() != 1) {
- throw new DdlException("replacement must be exactly one byte");
- }
- if (replacement.charAt(0) > 255) {
- throw new DdlException("replacement must be in [0,255]");
- }
+
charFilterProperties.put(InvertedIndexUtil.INVERTED_INDEX_PARSER_CHAR_FILTER_REPLACEMENT,
+ props.get("replacement"));
+ }
+ try {
Review Comment:
Reusing `checkCharFilterProperties()` here changes the custom-analyzer
policy contract for `char_replace`: a policy such as `CREATE INVERTED INDEX
CHAR_FILTER cf PROPERTIES("type"="char_replace")` used to be valid, because FE
stores the policy properties unchanged, BE strips only `type` when building
`Settings`, and `CharReplaceCharFilterFactory` defaults a missing `pattern` to
`,._`. After this bridge map always supplies legacy `char_filter_type`, the
shared helper rejects the same policy with `Missing 'char_filter_pattern'...`
before BE can apply that default. Please keep the stricter byte/ASCII
replacement checks, but do not require `pattern` for the policy path unless the
BE default is also removed intentionally and existing custom-analyzer usages
are migrated/tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]