kitalkuyo-gita commented on code in PR #674:
URL: https://github.com/apache/geaflow/pull/674#discussion_r2532614361
##########
geaflow-console/app/core/service/src/main/java/org/apache/geaflow/console/core/service/llm/LocalClient.java:
##########
@@ -35,14 +35,14 @@ public class LocalClient extends LLMClient {
private static final String DEFAULT_N_PREDICT = "128";
- private static final String TEMPLATE = "{"
+ private static final String JSON_REQUEST_TEMPLATE = "{"
Review Comment:
The default predict handling is unromantic: the class contains the constant
`DEFAULT_N_PREDICT` ("128"), but the code directly inserts it using the Integer
value from `llm.getPredict()`. If `llm.getPredict()` returns null, it inserts
"null" (the result of `String.format`), resulting in a null value for
`n_predict` in the generated JSON object—likely not the behavior expected by
the backend. `DEFAULT_N_PREDICT` is also unused, appearing redundant.
Recommendation: Use the default value of `int` (128) instead of the string
`DEFAULT_N_PREDICT`, or change `DEFAULT_N_PREDICT` to `int` and use a fallback
in `getJsonString` when `llm.getPredict()` returns null.
##########
geaflow-console/app/core/service/src/main/java/org/apache/geaflow/console/core/service/llm/LocalClient.java:
##########
@@ -35,14 +35,14 @@ public class LocalClient extends LLMClient {
private static final String DEFAULT_N_PREDICT = "128";
- private static final String TEMPLATE = "{"
+ private static final String JSON_REQUEST_TEMPLATE = "{"
+ "\"prompt\": \"%s\","
+ "\"n_predict\": %s}";
private String getJsonString(LocalConfigArgsClass llm, String prompt) {
Integer predict = llm.getPredict();
// trim() to replace the last \n
- return String.format(TEMPLATE, prompt.trim(), predict);
Review Comment:
The code directly formats the user-provided prompt into a JSON string
("prompt": "%s") using `String.format` without performing JSON escaping (e.g.,
double quotes, backslashes, newlines, control characters, etc.). If the prompt
contains double quotes or backslashes, it will result in invalid JSON and may
even lead to SQL injection vulnerabilities.
Recommendation: Use an existing JSON utility library (such as Jackson or
Gson) to construct JSON instead of manually concatenating strings. This will
automatically handle escaping and type encoding.
##########
geaflow-console/app/core/service/src/main/java/org/apache/geaflow/console/core/service/llm/LocalClient.java:
##########
@@ -35,14 +35,14 @@ public class LocalClient extends LLMClient {
private static final String DEFAULT_N_PREDICT = "128";
- private static final String TEMPLATE = "{"
+ private static final String JSON_REQUEST_TEMPLATE = "{"
+ "\"prompt\": \"%s\","
+ "\"n_predict\": %s}";
private String getJsonString(LocalConfigArgsClass llm, String prompt) {
Integer predict = llm.getPredict();
// trim() to replace the last \n
- return String.format(TEMPLATE, prompt.trim(), predict);
Review Comment:
example A:
```
private String getJsonString(LocalConfigArgsClass llm, String prompt) {
int predict = llm.getPredict() == null ? 128 : llm.getPredict();
ObjectNode root = new ObjectMapper().createObjectNode();
root.put("prompt", prompt.trim());
root.put("n_predict", predict);
return root.toString();
}
```
example B(Without importing any libraries, perform minimal JSON escaping +
default):
```
private static final int DEFAULT_N_PREDICT_INT = 128;
private static String escapeJsonString(String s) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
switch (ch) {
case '"': sb.append("\\\""); break;
case '\\': sb.append("\\\\"); break;
case '\b': sb.append("\\b"); break;
case '\f': sb.append("\\f"); break;
case '\n': sb.append("\\n"); break;
case '\r': sb.append("\\r"); break;
case '\t': sb.append("\\t"); break;
default:
if (ch < 0x20) {
sb.append(String.format("\\u%04x", (int) ch));
} else {
sb.append(ch);
}
}
}
return sb.toString();
}
private String getJsonString(LocalConfigArgsClass llm, String prompt) {
int predict = llm.getPredict() == null ? DEFAULT_N_PREDICT_INT :
llm.getPredict();
return String.format("{\"prompt\":\"%s\",\"n_predict\":%d}",
escapeJsonString(prompt.trim()), predict);
}
```
##########
geaflow/geaflow-dsl/geaflow-dsl-parser/src/main/java/org/apache/geaflow/dsl/util/StringLiteralUtil.java:
##########
@@ -56,7 +56,7 @@ public static String unescapeSQLString(String b) {
int base = i + 2;
for (int j = 0; j < 4; j++) {
int digit = Character.digit(b.charAt(j + base), 16);
- code += digit * multiplier[j];
+ code += digit * UNICODE_HEX_MULTIPLIER[j];
Review Comment:
The original `multiplier` array contained `{1000, 100, 10, 1}`—clearly
decimal weights, not hexadecimal weights. Unicode `\uXXXX` should be calculated
using 16^3, 16^2, 16^1, 16^0, which is `{4096, 256, 16, 1}`. The current code
(regardless of renaming) calculates the wrong character code point, causing
`\uXXXX` to be parsed incorrectly. This is a logical error.
Furthermore, there is no validation for `Character.digit(...)` returning -1.
If a non-hexadecimal character is encountered, a negative value will be
included in the calculation, resulting in an incorrect result without throwing
an explicit exception.
There is no validation for code that may be generated outside the `char`
range (due to inappropriate surrogate). However, since only four hexadecimal
numbers are parsed, their range is 0..0xFFFF, which falls within the `char`
range. The main issues are incorrect weighting and handling of invalid
characters.
Alternatively, simply adjust the weights (not recommended, only a remedy):
Change `multiplier` to `{4096, 256, 16, 1}`, and check `digit < 0` after
`Character.digit`.
Testing Recommendations
Add unit tests to cover:
- `"\u0041" => "A"`
- `"\u00e9" => "é"`
- Confirm the expected behavior (throw an exception or fallback) for input
containing illegal `\u` escape sequences (less than 4 characters or non-hex
characters).
- Handling common escape sequences like `\n` and `\t`.
exampleA:
```
public static String unescapeSQLString(String b) {
StringBuilder sb = new StringBuilder(b.length());
for (int i = 0; i < b.length(); i++) {
char ch = b.charAt(i);
if (ch == '\\' && i + 1 < b.length()) {
char next = b.charAt(i + 1);
if (next == 'u' || next == 'U') {
if (i + 6 <= b.length()) { // \u + 4 hex chars
String hex = b.substring(i + 2, i + 6);
try {
int code = Integer.parseInt(hex, 16);
sb.append((char) code);
i += 5;
continue;
} catch (NumberFormatException e) {
// 解析失败:可以选择抛出异常或保守回退(将原文本原样加入)
// 这里选择把原文本加入并继续
sb.append('\\').append(next);
i++;
continue;
}
} else {
// 不足 4 个字符,按原样处理
sb.append('\\').append(next);
i++;
continue;
}
}
// 处理常见转义
switch (next) {
case 'n': sb.append('\n'); i++; break;
case 't': sb.append('\t'); i++; break;
case 'r': sb.append('\r'); i++; break;
case 'b': sb.append('\b'); i++; break;
case 'f': sb.append('\f'); i++; break;
case '\\': sb.append('\\'); i++; break;
case '\'': sb.append('\''); i++; break;
case '"': sb.append('"'); i++; break;
default:
// 未识别转义序列,保留反斜杠
sb.append('\\');
break;
}
} else {
sb.append(ch);
}
}
return sb.toString();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]