[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #8335: Refactor streaming transformation code so it can be reused in other places

GitBox Sun, 13 Mar 2022 10:05:31 -0700


Jackie-Jiang commented on a change in pull request #8335:
URL: https://github.com/apache/pinot/pull/8335#discussion_r825477385




##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java
##########
@@ -510,42 +505,25 @@ private void processStreamEvents(MessageBatch 
messagesAndOffsets, long idlePipeS
           .decode(messagesAndOffsets.getMessageAtIndex(index), 
messagesAndOffsets.getMessageOffsetAtIndex(index),
               messagesAndOffsets.getMessageLengthAtIndex(index), reuse);
       if (decodedRow != null) {
-        List<GenericRow> transformedRows = new ArrayList<>();
+        TransformPipeline.Result result = new TransformPipeline.Result();
         try {
-          if (_complexTypeTransformer != null) {
-            // TODO: consolidate complex type transformer into composite type 
transformer
-            decodedRow = _complexTypeTransformer.transform(decodedRow);
-          }
-          Collection<GenericRow> rows = (Collection<GenericRow>) 
decodedRow.getValue(GenericRow.MULTIPLE_RECORDS_KEY);
-          if (rows != null) {
-            for (GenericRow row : rows) {
-              GenericRow transformedRow = _recordTransformer.transform(row);
-              if (transformedRow != null && 
IngestionUtils.shouldIngestRow(row)) {
-                transformedRows.add(transformedRow);
-              } else {
-                realtimeRowsDroppedMeter =
-                    _serverMetrics.addMeteredTableValue(_metricKeyName, 
ServerMeter.INVALID_REALTIME_ROWS_DROPPED, 1,
-                        realtimeRowsDroppedMeter);
-              }
-            }
-          } else {
-            GenericRow transformedRow = 
_recordTransformer.transform(decodedRow);
-            if (transformedRow != null && 
IngestionUtils.shouldIngestRow(transformedRow)) {
-              transformedRows.add(transformedRow);
-            } else {
-              realtimeRowsDroppedMeter =
-                  _serverMetrics.addMeteredTableValue(_metricKeyName, 
ServerMeter.INVALID_REALTIME_ROWS_DROPPED, 1,
-                      realtimeRowsDroppedMeter);
-            }
-          }
-        } catch (Exception e) {
+          result = _transformPipeline.processRow(decodedRow);
+        } catch (TransformPipeline.TransformException e) {
           _numRowsErrored++;
           String errorMessage = String.format("Caught exception while 
transforming the record: %s", decodedRow);
           _segmentLogger.error(errorMessage, e);
           _realtimeTableDataManager.addSegmentError(_segmentNameStr,
               new SegmentErrorInfo(System.currentTimeMillis(), errorMessage, 
e));
+          // for a row with multiple records (multi rows), if we encounter 
exception in the middle,
+          // there could be some rows that are processed successfully. We 
still wish to process them.
+          result = e.getPartialResult();

Review comment:
       Understood that you are trying to keep the exact same behavior, but it 
can make code less readable and I feel the original behavior might not be 
desired, and was made that way unintentional. Keeping partial results (if it 
can ever happen) can add extra dependency on the order of records transformed, 
which might not be deterministic. If we want to keep as many records ingested 
as possible, we should catch the exception within the pipeline, track the 
errored record count and the exception, then keep processing the remaining 
records. Passing the result from exception is not very clean to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #8335: Refactor streaming transformation code so it can be reused in other places

Reply via email to