DiogoP98 opened a new pull request, #8589:
URL: https://github.com/apache/storm/pull/8589

   ## What is the purpose of the change
   
   Fixes the silent failure of the topology spout lag query, where the Storm 
UI's "Topology Lag" panel stops working with this warning being logged 
repeatedly for every Kafka spout:
   
   ```
     TopologySpoutLag [WARN] Exception thrown while getting lag for spout id: 
<spoutId>
     TopologySpoutLag [WARN] Exception message:null
     java.lang.ClassCastException
   ```
   
   The regression was introduced by the Kafka client bump from 3.9.0 → 4.x 
(#8243). Two issues, on the same code path:
   
     1. KafkaOffsetLagUtil — NPE on partitions with no committed offset. The 
Kafka 4 API replaced consumer.committed(TopicPartition) (returns 
OffsetAndMetadata or null) with consumer.committed(Set<TopicPartition>) 
(returns a Map). The new code null-checks the map (which is never null), then
     dereferences map.get(tp).offset() — but the map's value for a partition 
with no committed offset is null, so .offset() throws NullPointerException. The 
fix moves the null check to the map value via a small 
resolveCommittedOffset(...) helper that returns -1 (the existing "no commit"
     sentinel), restoring the pre-4.x behavior.
     2. TopologySpoutLag — ClassCastException swallows the real error. When the 
kafka-monitor subprocess fails (the NPE above, or any other failure: broker 
unreachable, auth failure, bad config, etc.), it prints a plain-text error to 
stdout. TopologySpoutLag then runs (Map<String, Object>)
     JSONValue.parseWithException(stdout). json-smart parses unquoted plain 
text leniently as a String rather than throwing ParseException, so the existing 
catch (ParseException) doesn't fire and the unchecked cast throws 
ClassCastException — losing the original error text. The fix uses
     instanceof Map so non-Map parse results (and unparseable input) are 
surfaced through the existing errorInfo channel and shown in the UI instead of 
being dropped.
   
     Together, (1) restores correct lag reporting for partitions with no 
committed offset, and (2) ensures any future monitor failure is reported 
usefully instead of as a generic ClassCastException with a null message.
   
   ## How was the change tested
   
   Manual verification on a Storm 2.8.7 cluster:
   - Built storm-core and storm-kafka-monitor JARs and replaced them on the 
storm-ui node.
   - Confirmed the previously-failing spouts (managementKafka, secondaryKafka, 
Instructions, …) now return lag results in the UI with no TopologySpoutLag 
warnings in ui.log.
   - For a spout with a fresh consumer group (no committed offsets yet), 
confirmed lag reports correctly instead of throwing.
   
   Issue: https://github.com/apache/storm/issues/8588


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to