Jackie-Jiang commented on a change in pull request #8335:
URL: https://github.com/apache/pinot/pull/8335#discussion_r825477385
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java
##########
@@ -510,42 +505,25 @@ private void processStreamEvents(MessageBatch
messagesAndOffsets, long idlePipeS
.decode(messagesAndOffsets.getMessageAtIndex(index),
messagesAndOffsets.getMessageOffsetAtIndex(index),
messagesAndOffsets.getMessageLengthAtIndex(index), reuse);
if (decodedRow != null) {
- List<GenericRow> transformedRows = new ArrayList<>();
+ TransformPipeline.Result result = new TransformPipeline.Result();
try {
- if (_complexTypeTransformer != null) {
- // TODO: consolidate complex type transformer into composite type
transformer
- decodedRow = _complexTypeTransformer.transform(decodedRow);
- }
- Collection<GenericRow> rows = (Collection<GenericRow>)
decodedRow.getValue(GenericRow.MULTIPLE_RECORDS_KEY);
- if (rows != null) {
- for (GenericRow row : rows) {
- GenericRow transformedRow = _recordTransformer.transform(row);
- if (transformedRow != null &&
IngestionUtils.shouldIngestRow(row)) {
- transformedRows.add(transformedRow);
- } else {
- realtimeRowsDroppedMeter =
- _serverMetrics.addMeteredTableValue(_metricKeyName,
ServerMeter.INVALID_REALTIME_ROWS_DROPPED, 1,
- realtimeRowsDroppedMeter);
- }
- }
- } else {
- GenericRow transformedRow =
_recordTransformer.transform(decodedRow);
- if (transformedRow != null &&
IngestionUtils.shouldIngestRow(transformedRow)) {
- transformedRows.add(transformedRow);
- } else {
- realtimeRowsDroppedMeter =
- _serverMetrics.addMeteredTableValue(_metricKeyName,
ServerMeter.INVALID_REALTIME_ROWS_DROPPED, 1,
- realtimeRowsDroppedMeter);
- }
- }
- } catch (Exception e) {
+ result = _transformPipeline.processRow(decodedRow);
+ } catch (TransformPipeline.TransformException e) {
_numRowsErrored++;
String errorMessage = String.format("Caught exception while
transforming the record: %s", decodedRow);
_segmentLogger.error(errorMessage, e);
_realtimeTableDataManager.addSegmentError(_segmentNameStr,
new SegmentErrorInfo(System.currentTimeMillis(), errorMessage,
e));
+ // for a row with multiple records (multi rows), if we encounter
exception in the middle,
+ // there could be some rows that are processed successfully. We
still wish to process them.
+ result = e.getPartialResult();
Review comment:
Understood that you are trying to keep the exact same behavior, but it
can make code less readable and I feel the original behavior might not be
desired, and was made that way unintentional. Keeping partial results (if it
can ever happen) can add extra dependency on the order of records transformed,
which might not be deterministic. If we want to keep as many records ingested
as possible, we should catch the exception within the pipeline, track the
errored record count and the exception, then keep processing the remaining
records. Passing the result from exception is not very clean to me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]