icefury71 commented on a change in pull request #6890:
URL: https://github.com/apache/incubator-pinot/pull/6890#discussion_r633920187
##########
File path:
pinot-controller/src/main/java/org/apache/pinot/controller/util/ConsumingSegmentInfoReader.java
##########
@@ -131,6 +134,51 @@ private String generateServerURL(String tableNameWithType,
String endpoint) {
return String.format("%s/tables/%s/consumingSegmentsInfo", endpoint,
tableNameWithType);
}
+ /**
+ * Utility method to derive ingestion status from consuming segment Info.
Status is HEALTHY if
+ * consuming segment info specifies CONSUMING state for all active segments
across all servers
+ * including replicas.
+ */
+ public TableStatus.IngestionStatus getIngestionStatus(String
tableNameWithType, int timeoutMs) {
+ try {
+ ConsumingSegmentsInfoMap consumingSegmentsInfoMap =
getConsumingSegmentsInfo(tableNameWithType, timeoutMs);
+ for (Map.Entry<String, List<ConsumingSegmentInfo>>
consumingSegmentInfoEntry : consumingSegmentsInfoMap._segmentToConsumingInfoMap
+ .entrySet()) {
+ String segmentName = consumingSegmentInfoEntry.getKey();
+ List<ConsumingSegmentInfo> consumingSegmentInfoList =
consumingSegmentInfoEntry.getValue();
+ if (consumingSegmentInfoList == null ||
consumingSegmentInfoList.isEmpty()) {
+ String errorMessage = "Did not get any response from servers for
segment: " + segmentName;
+ return
TableStatus.IngestionStatus.newIngestionStatus(TableStatus.IngestionState.UNHEALTHY,
errorMessage);
+ }
+
+ // Check if any responses are missing
+ Set<String> serversForSegment =
_pinotHelixResourceManager.getServersForSegment(tableNameWithType, segmentName);
+ if (serversForSegment.size() != consumingSegmentInfoList.size()) {
+ Set<String> serversResponded =
+ consumingSegmentInfoList.stream().map(c ->
c._serverName).collect(Collectors.toSet());
+ serversForSegment.removeAll(serversResponded);
+ String errorMessage =
+ "Not all servers responded for segment: " + segmentName + "
Missing servers : " + serversForSegment;
+ return
TableStatus.IngestionStatus.newIngestionStatus(TableStatus.IngestionState.UNHEALTHY,
errorMessage);
+ }
+
+ for (ConsumingSegmentInfo consumingSegmentInfo :
consumingSegmentInfoList) {
+ if (consumingSegmentInfo._consumerState
+ .equals(ConsumerState.NOT_CONSUMING.toString())) {
Review comment:
From my read, it seemed like whenever
ServerGauge.LLC_PARTITION_CONSUMING is set to 0 under error conditions (except
for when consumption is complete or we're going from consuming to ONLINE), the
LLRealtimeSegmentDataManager.State is set to ERROR. So it should work for
any such erroneous conditions.
If that's not the case, then we can treat it as part of a bug fix for #6322
In Uber, AFAIK we rely on the ingestion time lag (from Kafka) to determine
problems in ingestions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]