[ 
https://issues.apache.org/jira/browse/STORM-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062565#comment-16062565
 ] 

Stig Rohde Døssing edited comment on STORM-2600 at 6/26/17 6:14 AM:
--------------------------------------------------------------------

[~pshah] That doesn't seem to match the behavior I'm seeing though. It looks to 
me like the getTopicsString value is put into the component configuration once, 
and then never updated, which is why the lag endpoint requests offsets for an 
empty topic list in the test I mentioned in this comment 
https://github.com/apache/storm/pull/2150#discussion_r123901054.

The lag endpoint uses the StormTopology ComponentCommon to read the 
getTopicsString value 
https://github.com/apache/storm/blob/64e29f365c9b5d3e15b33f33ab64e200345333e4/storm-core/src/jvm/org/apache/storm/utils/TopologySpoutLag.java#L143.
 As far as I can tell, that value is set here 
https://github.com/apache/storm/blob/9e31509d47c4e91c1009f55c7ccf321d7d7e63aa/storm-client/src/jvm/org/apache/storm/topology/TopologyBuilder.java#L542,
 getting fetched through getComponentConfiguration where the spout sets the 
topic list 
https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java#L525.
  

The problem is that the StormTopology ComponentCommon value doesn't seem to be 
updated after the topology is built, so it doesn't matter what getTopicsString 
returns once the spout is up and running, because it'll never be propagated to 
the lag endpoint.

About point 2, the worry is a minor one, but my concern isn't that the Kafka 
API breaks, since we use such a small part of it for storm-kafka-monitor. My 
concern is that the KafkaConsumer is updated to use new request versions (e.g. 
FetchRequest) toward Kafka. Generally older request versions are only supported 
for one release. In this case you wouldn't necessarily have to make any changes 
to your topology jar beyond replacing the client library, but you would have to 
rebuild storm-kafka-monitor since the jar is shaded in. Like I said, it's a 
fairly minor thing.

About 3, I agree that it is possible to work around, and that there probably 
aren't that many Windows users. I just thought it was worth mentioning, since 
this is a case where there is no intrinsic reason we can't support Windows 
(unlike for example cgroups support), so I see it as a minor disadvantage to 
doing offset lag tracking this way.

I agree that it would be good to have some opinions on whether a metrics-based 
solution would be desirable. My main concern about the current implementation 
is that I don't think point 1 works. The other two are just nitpicking to some 
extent.


was (Author: srdo):
@priyank5485 That doesn't seem to match the behavior I'm seeing though. It 
looks to me like the getTopicsString value is put into the component 
configuration once, and then never updated, which is why the lag endpoint 
requests offsets for an empty topic list in the test I mentioned in this 
comment https://github.com/apache/storm/pull/2150#discussion_r123901054.

The lag endpoint uses the StormTopology ComponentCommon to read the 
getTopicsString value 
https://github.com/apache/storm/blob/64e29f365c9b5d3e15b33f33ab64e200345333e4/storm-core/src/jvm/org/apache/storm/utils/TopologySpoutLag.java#L143.
 As far as I can tell, that value is set here 
https://github.com/apache/storm/blob/9e31509d47c4e91c1009f55c7ccf321d7d7e63aa/storm-client/src/jvm/org/apache/storm/topology/TopologyBuilder.java#L542,
 getting fetched through getComponentConfiguration where the spout sets the 
topic list 
https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java#L525.
  

The problem is that the StormTopology ComponentCommon value doesn't seem to be 
updated after the topology is built, so it doesn't matter what getTopicsString 
returns once the spout is up and running, because it'll never be propagated to 
the lag endpoint.

About point 2, the worry is a minor one, but my concern isn't that the Kafka 
API breaks, since we use such a small part of it for storm-kafka-monitor. My 
concern is that the KafkaConsumer is updated to use new request versions (e.g. 
FetchRequest) toward Kafka. Generally older request versions are only supported 
for one release. In this case you wouldn't necessarily have to make any changes 
to your topology jar beyond replacing the client library, but you would have to 
rebuild storm-kafka-monitor since the jar is shaded in. Like I said, it's a 
fairly minor thing.

About 3, I agree that it is possible to work around, and that there probably 
aren't that many Windows users. I just thought it was worth mentioning, since 
this is a case where there is no intrinsic reason we can't support Windows 
(unlike for example cgroups support), so I see it as a minor disadvantage to 
doing offset lag tracking this way.

I agree that it would be good to have some opinions on whether a metrics-based 
solution would be desirable. My main concern about the current implementation 
is that I don't think point 1 works. The other two are just nitpicking to some 
extent.

> Improve or replace storm-kafka-monitor
> --------------------------------------
>
>                 Key: STORM-2600
>                 URL: https://issues.apache.org/jira/browse/STORM-2600
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-kafka-monitor
>    Affects Versions: 2.0.0
>            Reporter: Stig Rohde Døssing
>            Priority: Minor
>
> The storm-kafka-monitor module, which is used by Storm UI to show offset lag 
> for topologies with Kafka spouts, has some shortcomings:
> * The Storm UI integration code doesn't seem to be able to support topic 
> subscriptions that change after topology submission. The UI code 
> (https://github.com/apache/storm/blob/64e29f365c9b5d3e15b33f33ab64e200345333e4/storm-core/src/jvm/org/apache/storm/utils/TopologySpoutLag.java#L91)
>  gets the topic list it should request offset lag for via the spout's 
> getComponentConfiguration method, as far as I can tell through this call 
> https://github.com/apache/storm/blob/9e31509d47c4e91c1009f55c7ccf321d7d7e63aa/storm-client/src/jvm/org/apache/storm/topology/TopologyBuilder.java#L541.
>  It seems like the component configuration is intended to be static once the 
> topology has started running. This prevents us from showing the right topic 
> list for subscriptions that are not known at submission time, which is 
> currently the case for Pattern subscriptions. The topic list for that type of 
> subscription isn't known until the spout has started the KafkaConsumer in 
> {{ISpout.open()}}. I don't see a way to fix this, unless there is some way to 
> update the component configuration when the subscription changes.
> * The jar is installed along with the cluster, and depends on the Kafka 
> version specified in Storm's root POM. Kafka guarantees backwards compatible 
> client-server communication for one release only, so there's a potential 
> coupling between Storm cluster version and Kafka version. If users want to 
> update the Kafka version in storm-kafka-monitor, they have to rebuild that 
> module and replace the jar in their Storm install.
> * The UI integration uses the storm-kafka-monitor Bash script to start the 
> monitoring code, in order to avoid a dependency between storm-core and 
> storm-kafka-monitor. This prevents the UI integration from working on 
> Windows. We could supply a Windows script as well, but then we'd need to keep 
> the two in sync.
> I am wondering if these problems could be solved by implementing offset lag 
> monitoring via the metrics system instead. The spout could periodically seek 
> to the log end offset and submit a metric for how far behind the committed 
> offset is, then seek back to where it left off.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to