kitalkuyo-gita commented on code in PR #674:
URL: https://github.com/apache/geaflow/pull/674#discussion_r2532614361


##########
geaflow-console/app/core/service/src/main/java/org/apache/geaflow/console/core/service/llm/LocalClient.java:
##########
@@ -35,14 +35,14 @@ public class LocalClient extends LLMClient {
 
     private static final String DEFAULT_N_PREDICT = "128";
 
-    private static final String TEMPLATE = "{"
+    private static final String JSON_REQUEST_TEMPLATE = "{"

Review Comment:
   The default predict handling is unromantic: the class contains the constant 
`DEFAULT_N_PREDICT` ("128"), but the code directly inserts it using the Integer 
value from `llm.getPredict()`. If `llm.getPredict()` returns null, it inserts 
"null" (the result of `String.format`), resulting in a null value for 
`n_predict` in the generated JSON object—likely not the behavior expected by 
the backend. `DEFAULT_N_PREDICT` is also unused, appearing redundant.
   
   Recommendation: Use the default value of `int` (128) instead of the string 
`DEFAULT_N_PREDICT`, or change `DEFAULT_N_PREDICT` to `int` and use a fallback 
in `getJsonString` when `llm.getPredict()` returns null.



##########
geaflow-console/app/core/service/src/main/java/org/apache/geaflow/console/core/service/llm/LocalClient.java:
##########
@@ -35,14 +35,14 @@ public class LocalClient extends LLMClient {
 
     private static final String DEFAULT_N_PREDICT = "128";
 
-    private static final String TEMPLATE = "{"
+    private static final String JSON_REQUEST_TEMPLATE = "{"
         + "\"prompt\": \"%s\","
         + "\"n_predict\": %s}";
 
     private String getJsonString(LocalConfigArgsClass llm, String prompt) {
         Integer predict = llm.getPredict();
         // trim() to replace the last \n 
-        return String.format(TEMPLATE, prompt.trim(), predict);

Review Comment:
   The code directly formats the user-provided prompt into a JSON string 
("prompt": "%s") using `String.format` without performing JSON escaping (e.g., 
double quotes, backslashes, newlines, control characters, etc.). If the prompt 
contains double quotes or backslashes, it will result in invalid JSON and may 
even lead to SQL injection vulnerabilities.
   
   Recommendation: Use an existing JSON utility library (such as Jackson or 
Gson) to construct JSON instead of manually concatenating strings. This will 
automatically handle escaping and type encoding.



##########
geaflow-console/app/core/service/src/main/java/org/apache/geaflow/console/core/service/llm/LocalClient.java:
##########
@@ -35,14 +35,14 @@ public class LocalClient extends LLMClient {
 
     private static final String DEFAULT_N_PREDICT = "128";
 
-    private static final String TEMPLATE = "{"
+    private static final String JSON_REQUEST_TEMPLATE = "{"
         + "\"prompt\": \"%s\","
         + "\"n_predict\": %s}";
 
     private String getJsonString(LocalConfigArgsClass llm, String prompt) {
         Integer predict = llm.getPredict();
         // trim() to replace the last \n 
-        return String.format(TEMPLATE, prompt.trim(), predict);

Review Comment:
   example A:
   
   ```
   private String getJsonString(LocalConfigArgsClass llm, String prompt) {
       int predict = llm.getPredict() == null ? 128 : llm.getPredict();
       ObjectNode root = new ObjectMapper().createObjectNode();
       root.put("prompt", prompt.trim());
       root.put("n_predict", predict);
       return root.toString();
   }
   ```
   
   example B(Without importing any libraries, perform minimal JSON escaping + 
default):
   
   ```
   private static final int DEFAULT_N_PREDICT_INT = 128;
   private static String escapeJsonString(String s) {
       StringBuilder sb = new StringBuilder();
       for (int i = 0; i < s.length(); i++) {
           char ch = s.charAt(i);
           switch (ch) {
               case '"': sb.append("\\\""); break;
               case '\\': sb.append("\\\\"); break;
               case '\b': sb.append("\\b"); break;
               case '\f': sb.append("\\f"); break;
               case '\n': sb.append("\\n"); break;
               case '\r': sb.append("\\r"); break;
               case '\t': sb.append("\\t"); break;
               default:
                   if (ch < 0x20) {
                       sb.append(String.format("\\u%04x", (int) ch));
                   } else {
                       sb.append(ch);
                   }
           }
       }
       return sb.toString();
   }
   
   private String getJsonString(LocalConfigArgsClass llm, String prompt) {
       int predict = llm.getPredict() == null ? DEFAULT_N_PREDICT_INT : 
llm.getPredict();
       return String.format("{\"prompt\":\"%s\",\"n_predict\":%d}", 
escapeJsonString(prompt.trim()), predict);
   }
   ```
   



##########
geaflow/geaflow-dsl/geaflow-dsl-parser/src/main/java/org/apache/geaflow/dsl/util/StringLiteralUtil.java:
##########
@@ -56,7 +56,7 @@ public static String unescapeSQLString(String b) {
                 int base = i + 2;
                 for (int j = 0; j < 4; j++) {
                     int digit = Character.digit(b.charAt(j + base), 16);
-                    code += digit * multiplier[j];
+                    code += digit * UNICODE_HEX_MULTIPLIER[j];

Review Comment:
   The original `multiplier` array contained `{1000, 100, 10, 1}`—clearly 
decimal weights, not hexadecimal weights. Unicode `\uXXXX` should be calculated 
using 16^3, 16^2, 16^1, 16^0, which is `{4096, 256, 16, 1}`. The current code 
(regardless of renaming) calculates the wrong character code point, causing 
`\uXXXX` to be parsed incorrectly. This is a logical error.
   
   Furthermore, there is no validation for `Character.digit(...)` returning -1. 
If a non-hexadecimal character is encountered, a negative value will be 
included in the calculation, resulting in an incorrect result without throwing 
an explicit exception.
   
   There is no validation for code that may be generated outside the `char` 
range (due to inappropriate surrogate). However, since only four hexadecimal 
numbers are parsed, their range is 0..0xFFFF, which falls within the `char` 
range. The main issues are incorrect weighting and handling of invalid 
characters.
   
   Alternatively, simply adjust the weights (not recommended, only a remedy):
   
   Change `multiplier` to `{4096, 256, 16, 1}`, and check `digit < 0` after 
`Character.digit`.
   
   Testing Recommendations
   
   Add unit tests to cover:
   
   - `"\u0041" => "A"`
   
   - `"\u00e9" => "é"`
   - Confirm the expected behavior (throw an exception or fallback) for input 
containing illegal `\u` escape sequences (less than 4 characters or non-hex 
characters).
   
   - Handling common escape sequences like `\n` and `\t`.
   
   exampleA:
   
   ```
   public static String unescapeSQLString(String b) {
       StringBuilder sb = new StringBuilder(b.length());
       for (int i = 0; i < b.length(); i++) {
           char ch = b.charAt(i);
           if (ch == '\\' && i + 1 < b.length()) {
               char next = b.charAt(i + 1);
               if (next == 'u' || next == 'U') {
                   if (i + 6 <= b.length()) { // \u + 4 hex chars
                       String hex = b.substring(i + 2, i + 6);
                       try {
                           int code = Integer.parseInt(hex, 16);
                           sb.append((char) code);
                           i += 5;
                           continue;
                       } catch (NumberFormatException e) {
                           // 解析失败:可以选择抛出异常或保守回退(将原文本原样加入)
                           // 这里选择把原文本加入并继续
                           sb.append('\\').append(next);
                           i++;
                           continue;
                       }
                   } else {
                       // 不足 4 个字符,按原样处理
                       sb.append('\\').append(next);
                       i++;
                       continue;
                   }
               }
               // 处理常见转义
               switch (next) {
                   case 'n': sb.append('\n'); i++; break;
                   case 't': sb.append('\t'); i++; break;
                   case 'r': sb.append('\r'); i++; break;
                   case 'b': sb.append('\b'); i++; break;
                   case 'f': sb.append('\f'); i++; break;
                   case '\\': sb.append('\\'); i++; break;
                   case '\'': sb.append('\''); i++; break;
                   case '"': sb.append('"'); i++; break;
                   default:
                       // 未识别转义序列,保留反斜杠
                       sb.append('\\');
                       break;
               }
           } else {
               sb.append(ch);
           }
       }
       return sb.toString();
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to