KKcorps commented on code in PR #9511:
URL: https://github.com/apache/pinot/pull/9511#discussion_r988628937


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java:
##########
@@ -51,24 +63,35 @@ public SanitizationTransformer(Schema schema) {
   @Override
   public GenericRow transform(GenericRow record) {
     for (Map.Entry<String, Integer> entry : 
_stringColumnMaxLengthMap.entrySet()) {
-      String stringColumn = entry.getKey();
-      int maxLength = entry.getValue();
-      Object value = record.getValue(stringColumn);
-      if (value instanceof String) {
-        // Single-valued column
-        String stringValue = (String) value;
-        String sanitizedValue = StringUtil.sanitizeStringValue(stringValue, 
maxLength);
-        // NOTE: reference comparison
-        //noinspection StringEquality
-        if (sanitizedValue != stringValue) {
-          record.putValue(stringColumn, sanitizedValue);
+      try {
+        String stringColumn = entry.getKey();
+        int maxLength = entry.getValue();
+        Object value = record.getValue(stringColumn);
+        if (value instanceof String) {
+          // Single-valued column
+          String stringValue = (String) value;
+          String sanitizedValue = StringUtil.sanitizeStringValue(stringValue, 
maxLength);
+          // NOTE: reference comparison
+          //noinspection StringEquality
+          if (sanitizedValue != stringValue) {
+            record.putValue(stringColumn, sanitizedValue);
+          }
+        } else {
+          // Multi-valued column
+          Object[] values = (Object[]) value;
+          int numValues = values.length;
+          for (int i = 0; i < numValues; i++) {
+            values[i] = StringUtil.sanitizeStringValue(values[i].toString(), 
maxLength);
+          }
         }
-      } else {
-        // Multi-valued column
-        Object[] values = (Object[]) value;
-        int numValues = values.length;
-        for (int i = 0; i < numValues; i++) {
-          values[i] = StringUtil.sanitizeStringValue(values[i].toString(), 
maxLength);
+      } catch (Exception e) {
+        if (!_continueOnError) {
+          throw new RuntimeException("Caught exception while sanitizing value 
for column: " + entry.getKey(), e);
+        } else {
+          LOGGER.debug("Caught exception while sanitizing value for column: 
{}", entry.getKey(), e);
+          //TODO: should put NULL here instead of string `null` and use 
NullValueTransformer for appropriate value
+          record.putValue(entry.getKey(), "null");
+          record.putValue(GenericRow.INCOMPLETE_RECORD_KEY, true);

Review Comment:
   Yes. We were using it to increment the metric 
`ServerMeter.INCOMPLETE_REALTIME_ROWS_CONSUMED` as well as log using 
`TransformPipeline.Result.getIncompleteRowCount()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to