hoshinojyunn commented on code in PR #64794:
URL: https://github.com/apache/doris/pull/64794#discussion_r3503175383


##########
fe/fe-core/src/main/java/org/apache/doris/indexpolicy/CharReplaceCharFilterValidator.java:
##########
@@ -39,25 +42,20 @@ protected String getTypeName() {
 
     @Override
     protected void validateSpecific(Map<String, String> props) throws 
DdlException {
+        Map<String, String> charFilterProperties = new HashMap<>();
+        
charFilterProperties.put(InvertedIndexUtil.INVERTED_INDEX_PARSER_CHAR_FILTER_TYPE,
+                InvertedIndexUtil.INVERTED_INDEX_CHAR_FILTER_CHAR_REPLACE);
         if (props.containsKey("pattern")) {
-            String pattern = props.get("pattern");
-            if (pattern != null && !pattern.isEmpty()) {
-                for (int i = 0; i < pattern.length(); i++) {
-                    if (pattern.charAt(i) > 255) {
-                        throw new DdlException(
-                                "pattern must contain only single-byte 
characters in [0,255]");
-                    }
-                }
-            }
+            
charFilterProperties.put(InvertedIndexUtil.INVERTED_INDEX_PARSER_CHAR_FILTER_PATTERN,
 props.get("pattern"));
         }
         if (props.containsKey("replacement")) {
-            String replacement = props.get("replacement");
-            if (replacement == null || replacement.length() != 1) {
-                throw new DdlException("replacement must be exactly one byte");
-            }
-            if (replacement.charAt(0) > 255) {
-                throw new DdlException("replacement must be in [0,255]");
-            }
+            
charFilterProperties.put(InvertedIndexUtil.INVERTED_INDEX_PARSER_CHAR_FILTER_REPLACEMENT,
+                    props.get("replacement"));
+        }
+        try {

Review Comment:
   Allowing the `pattern` to be omitted in the `custom-analyzer` appears to be 
a design flaw; validation in the older analyzer version did not permit an 
omitted `pattern`, whereas the `custom-analyzer` defaults to `",._"` when the 
`pattern` is missing—which makes no sense. Therefore, I prefer to mandate that 
the `pattern` cannot be omitted, while using a space (`" "`) as the default 
when `replacement` is omitted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to