Jackie-Jiang commented on code in PR #18429:
URL: https://github.com/apache/pinot/pull/18429#discussion_r3191917091


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java:
##########
@@ -38,24 +39,41 @@
 import org.apache.pinot.core.util.NumberUtils;
 import org.apache.pinot.core.util.NumericException;
 import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.utils.BooleanUtils;
 import org.apache.pinot.spi.utils.JsonUtils;
+import org.apache.pinot.spi.utils.TimestampUtils;
 import org.roaringbitmap.RoaringBitmap;
 
 
-/**
- * The <code>JsonExtractScalarTransformFunction</code> class implements the 
json path transformation based on
- * <a href="https://goessner.net/articles/JsonPath/";>Stefan Goessner JsonPath 
implementation.</a>.
- *
- * Please note, currently this method only works with String field. The values 
in this field should be Json String.
- *
- * Usage:
- * jsonExtractScalar(jsonFieldName, 'jsonPath', 'resultsType')
- * <code>jsonFieldName</code> is the Json String field/expression.
- * <code>jsonPath</code> is a JsonPath expression which used to read from JSON 
document
- * <code>results_type</code> refers to the results data type, could be INT, 
LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING,
- * INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.
- *
- */
+/// Implements the `jsonExtractScalar(jsonField, jsonPath, resultsType[, 
defaultValue])` transform.
+/// Reads a JSON document from `jsonField` for each row, resolves the
+/// [Stefan Goessner JsonPath](https://goessner.net/articles/JsonPath/) 
expression against it, and
+/// converts the resolved value to `resultsType`.
+///
+/// **Arguments:**
+/// - `jsonField` — single-value `STRING` or `BYTES` column / transform 
expression containing JSON.
+/// - `jsonPath` — JsonPath expression used to read the value.
+/// - `resultsType` — Pinot data type for the output. Append `_ARRAY` for 
multi-value results.
+/// - `defaultValue` (optional) — used when the path resolves to `null` or 
fails. Without it, unresolved
+///   SV rows throw `IllegalArgumentException`; MV rows surface as empty 
arrays, but null elements within
+///   a resolved array still throw.
+///
+/// **Supported `resultsType`:** `INT`, `LONG`, `FLOAT`, `DOUBLE`, 
`BIG_DECIMAL`, `BOOLEAN`, `TIMESTAMP`,
+/// `STRING`, `JSON`, `BYTES`, plus `_ARRAY` variants of `INT` / `LONG` / 
`FLOAT` / `DOUBLE` /
+/// `BIG_DECIMAL` / `STRING`.
+///
+/// **Per-row coercion** of the JsonPath result to `resultsType`:
+/// - `BOOLEAN` (stored as `INT`) follows Pinot's numeric convention — any 
non-zero `Number` is true;
+///   `Boolean` and `"true"` / `"TRUE"` / `"1"` strings (via 
[BooleanUtils#toInt(String)]) are also true.
+/// - `TIMESTAMP` (stored as `LONG`) accepts numeric epoch millis directly; 
strings go through
+///   [TimestampUtils#toMillisSinceEpoch] (ISO-8601 and numeric millis 
strings).
+/// - `STRING` returns `String` values as-is; other JSON values are serialized 
via
+///   [JsonUtils#objectToString].
+/// - `BIG_DECIMAL` and `STRING` paths use a BigDecimal-preserving JSON parser
+///   (`JSON_PARSER_CONTEXT_WITH_BIG_DECIMAL`) to avoid precision loss on 
numeric values; other paths use
+///   the default parser.
+/// - Other types coerce via `Number` cast (preserved as the canonical 
primitive form) or
+///   `parse*(toString())`.

Review Comment:
   `///` is JEP 467 Markdown Javadoc, fully supported by the `javadoc` tool. 
It's the project-preferred style for new Javadoc — renders correctly in IDEs 
and `javadoc` HTML output. Keeping as-is.



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java:
##########
@@ -437,17 +411,17 @@ public int[][] transformToIntValuesMV(ValueBlock 
valueBlock) {
       int numValues = result.size();
       int[] values = new int[numValues];
       for (int j = 0; j < numValues; j++) {
-        Integer value = result.get(j);
-        if (value == null) {
+        Object element = result.get(j);
+        if (element == null) {
           if (_defaultValue != null) {
-            value = ((Number) _defaultValue).intValue();
-          } else {
-            throw new IllegalArgumentException(
-                "At least one of the resolved JSON arrays include nulls, which 
is not supported in Pinot. "
-                    + "Consider setting a default value as the fourth argument 
of json_extract_scalar.");
+            values[j] = defaultValue;
+            continue;
           }
+          throw new IllegalArgumentException(
+              "At least one of the resolved JSON arrays include nulls, which 
is not supported in Pinot. "
+                  + "Consider setting a default value as the fourth argument 
of json_extract_scalar.");

Review Comment:
   The `json_extract_scalar` (snake_case) form is pre-existing from #16683 and 
appears identically in 6 messages — not introduced by this PR. This PR is 
scoped to type-coercion correctness; the cosmetic naming inconsistency is 
better handled as a separate one-liner touching all 6 messages at once.



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java:
##########
@@ -577,23 +607,87 @@ public String[][] transformToStringValuesMV(ValueBlock 
valueBlock) {
       int numValues = result.size();
       String[] values = new String[numValues];
       for (int j = 0; j < numValues; j++) {
-        String value = result.get(j);
-        if (value == null) {
+        Object element = result.get(j);
+        if (element == null) {
           if (_defaultValue != null) {
-            value = _defaultValue.toString();
-          } else {
-            throw new IllegalArgumentException(
-                "At least one of the resolved JSON arrays include nulls, which 
is not supported in Pinot. "
-                    + "Consider setting a default value as the fourth argument 
of json_extract_scalar.");
+            values[j] = defaultValue;
+            continue;
           }
+          throw new IllegalArgumentException(
+              "At least one of the resolved JSON arrays include nulls, which 
is not supported in Pinot. "
+                  + "Consider setting a default value as the fourth argument 
of json_extract_scalar.");
         }
-        values[j] = value;
+        values[j] = toString(element);
       }
       _stringValuesMV[i] = values;
     }
     return _stringValuesMV;
   }
 
+  private static int toInt(Object value, boolean isBoolean) {
+    if (isBoolean) {
+      if (value instanceof Boolean) {
+        return (Boolean) value ? 1 : 0;
+      }
+      // For BOOLEAN result, follow PinotDataType numeric convention: non-zero 
number → true.
+      if (value instanceof Number) {
+        return ((Number) value).doubleValue() != 0 ? 1 : 0;
+      }
+      // String fallback: BooleanUtils.toInt accepts "true" / "TRUE" / "1".
+      return BooleanUtils.toInt(value.toString());
+    }
+    if (value instanceof Number) {
+      return ((Number) value).intValue();
+    }
+    return Integer.parseInt(value.toString());
+  }
+
+  private static long toLong(Object value, boolean isTimestamp) {
+    if (value instanceof Number) {
+      return ((Number) value).longValue();
+    }
+    if (isTimestamp) {
+      return TimestampUtils.toMillisSinceEpoch(value.toString());
+    }
+    try {
+      return NumberUtils.parseJsonLong(value.toString());
+    } catch (NumericException nfe) {
+      throw new NumberFormatException("For input string: \"" + value + "\"");
+    }
+  }
+
+  private static float toFloat(Object value) {
+    if (value instanceof Number) {
+      return ((Number) value).floatValue();
+    }
+    return Float.parseFloat(value.toString());
+  }
+
+  private static double toDouble(Object value) {
+    if (value instanceof Number) {
+      return ((Number) value).doubleValue();
+    }
+    return Double.parseDouble(value.toString());
+  }
+
+  private static BigDecimal toBigDecimal(Object value) {
+    if (value instanceof BigDecimal) {
+      return (BigDecimal) value;
+    }
+    return new BigDecimal(value.toString());
+  }
+
+  private static String toString(Object value) {
+    if (value instanceof String) {
+      return (String) value;
+    }
+    try {
+      return JsonUtils.objectToString(value);
+    } catch (JsonProcessingException e) {
+      throw new RuntimeException("Caught exception while serializing JSON 
value: " + value, e);
+    }

Review Comment:
   `value` here is the JsonPath result that Jackson failed to serialize — it's 
the only diagnostic signal a user has when serialization fails (e.g., circular 
reference, non-JSONable type, custom class). Truncating it would hide the 
actual problem; the original `JsonProcessingException` is already chained for 
the stack. Other JSON-serialization paths in Pinot don't truncate values in 
error messages either.



##########
pinot-core/src/test/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunctionTest.java:
##########
@@ -653,4 +653,177 @@ public void mvWithNullsWithDefault(String resultsType, 
String defaultValSql) {
         // TODO: Change the framework to do not duplicate segments when only 
one segment is used
         .thenResultIs(expectedRow, expectedRow); // 2 rows because of segment 
duplication
   }
+
+  // === Per-stored-type coercion tests for the value-handling helpers ===
+
+  /// Runs `SELECT jsonExtractScalar(json, '$.v', resultsType) FROM testTable` 
against a single-row table
+  /// containing the given JSON document, and asserts the result for the 
(always-duplicated) two
+  /// expected rows.

Review Comment:
   Same as the class-header thread — `///` is JEP 467 Markdown Javadoc, the 
project-preferred style. No tooling issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to