ege-st commented on code in PR #12157:
URL: https://github.com/apache/pinot/pull/12157#discussion_r1474666166
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -140,6 +142,8 @@ public class PinotLLCRealtimeSegmentManager {
// Max time to wait for all LLC segments to complete committing their
metadata while stopping the controller.
private static final long MAX_LLC_SEGMENT_METADATA_COMMIT_TIME_MILLIS =
30_000L;
+ private Map<Pair<String, String>, SegmentErrorInfo> _errorCache;
Review Comment:
Just double checking my understanding of the error cache. It's a map from
each (table, segment) pair that is on this server to the most recent error
message that was seen for that table/segment? In other words, for each server,
we'll see the most recent error on each segment on that server.
1. Longer term is how to manage noisy errors vs not-noisy errors. For
example: if there's an error with missing offsets (which you're monitoring for
in this PR) and a decoding error on 1/5 messages, the decoding error will flood
the cache and block out the Offset Error from being seen.
2. What happens when a table/segment is deleted or moved? The error cache
will still have the non-existent segments and provide invalid information. We
have this issue with Ingestion Lag metrics and it's frequently causing false
alerts and issues. If this happens multiple times then we can wind up with many
servers reporting errors for the same segment which will be confusing during
investigations.
3. If you limit the size of this map, then it still needs to support all the
extant segments that are on a server: so I'm not sure setting a fixed limit
will work b/c how many segments a single server can have is not, so far as I
know, strictly limited. So how can we determine what the max size should be?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]