DiogoP98 opened a new pull request, #8589:
URL: https://github.com/apache/storm/pull/8589
## What is the purpose of the change
Fixes the silent failure of the topology spout lag query, where the Storm
UI's "Topology Lag" panel stops working with this warning being logged
repeatedly for every Kafka spout:
```
TopologySpoutLag [WARN] Exception thrown while getting lag for spout id:
<spoutId>
TopologySpoutLag [WARN] Exception message:null
java.lang.ClassCastException
```
The regression was introduced by the Kafka client bump from 3.9.0 → 4.x
(#8243). Two issues, on the same code path:
1. KafkaOffsetLagUtil — NPE on partitions with no committed offset. The
Kafka 4 API replaced consumer.committed(TopicPartition) (returns
OffsetAndMetadata or null) with consumer.committed(Set<TopicPartition>)
(returns a Map). The new code null-checks the map (which is never null), then
dereferences map.get(tp).offset() — but the map's value for a partition
with no committed offset is null, so .offset() throws NullPointerException. The
fix moves the null check to the map value via a small
resolveCommittedOffset(...) helper that returns -1 (the existing "no commit"
sentinel), restoring the pre-4.x behavior.
2. TopologySpoutLag — ClassCastException swallows the real error. When the
kafka-monitor subprocess fails (the NPE above, or any other failure: broker
unreachable, auth failure, bad config, etc.), it prints a plain-text error to
stdout. TopologySpoutLag then runs (Map<String, Object>)
JSONValue.parseWithException(stdout). json-smart parses unquoted plain
text leniently as a String rather than throwing ParseException, so the existing
catch (ParseException) doesn't fire and the unchecked cast throws
ClassCastException — losing the original error text. The fix uses
instanceof Map so non-Map parse results (and unparseable input) are
surfaced through the existing errorInfo channel and shown in the UI instead of
being dropped.
Together, (1) restores correct lag reporting for partitions with no
committed offset, and (2) ensures any future monitor failure is reported
usefully instead of as a generic ClassCastException with a null message.
## How was the change tested
Manual verification on a Storm 2.8.7 cluster:
- Built storm-core and storm-kafka-monitor JARs and replaced them on the
storm-ui node.
- Confirmed the previously-failing spouts (managementKafka, secondaryKafka,
Instructions, …) now return lag results in the UI with no TopologySpoutLag
warnings in ui.log.
- For a spout with a fresh consumer group (no committed offsets yet),
confirmed lag reports correctly instead of throwing.
Issue: https://github.com/apache/storm/issues/8588
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]