[jira] [Commented] (STORM-2983) Some topologies not working properly

Robert Joseph Evans (JIRA) Wed, 04 Apr 2018 08:23:13 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425709#comment-16425709
 ]


Robert Joseph Evans commented on STORM-2983:
--------------------------------------------

[~kabhwan] and [~roshan_naik]

"Is there good reason why topology.workers cannot be dynamically updated to 
reflect the actual worker count. "

Yes.

It would require restarting all workers every time there was a scheduling 
change in the number of workers.  We can support it, and if someone shows a 
good use case as to why we would need it we can try and do it, but it is far 
from ideal.

"The  [documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] 
specifically states that the system supports overriding settings. "

Yes but that is not a dynamic override.  This is because the config is treated 
as immutable within the worker.  Additionally there is a convention where 
configs that start with "topology." are meant for users to set, and for the 
system to read, not the other way around.

"I think the topology configuration should be immutable one, though we don't 
guarantee it from the code."

Actually all currently released versions of storm are based off of clojure.  
The Map that we pass around is an immutable clojure map.  I would be 100% on 
board with making the map we pass around in the worker immutable for 2.x as 
well.  Hadoop has so many horrible issues because they do not enforce this in 
their config.  We really don't want to get into the business of allowing this 
within a worker.  The Config object must be mutable because users are creating 
it to launch their topology.  Once it is created, except for a few configs we 
set at topology submission time, and through the rebalance API which will 
enforce the workers restarting, we do not change the config.

 

"We need a way to in code to check the worker count (for internal and user 
code)."

For internal system code I can see a use case for this and I think we can 
provide it.  However, I don't see any value in providing this for user code.  

"as they can query them to find out the state of the topology"

Finding the current state of a topology by a bolt/spout is something we have 
always been bad at.  I would love to see more of this exposed to topology 
users, but I don't think config is the right place to expose it.  Also I would 
like to have a real solid use case for everything we expose.  The issue is the 
more we expose to end users the harder it is to change the internals of the 
system.  If we had exposed something about the outbound queues to end users it 
would have been much harder to make the changes that got rid of the outbound 
queues.

I understand letting the grouping know how overloaded downstream bolts are, 
which we added with load aware groupings.

I understand letting the grouping know where downstream bolts are schedules, 
which we added with locality aware groupings.

I understand letting a bolt know how full its inbound queue is so it can 
possibly play games with batching data to an external system (something we 
don't currently do but would be good to).

What I don't understand is what would a bolt do differently for a single worker 
setup vs a multi-worker setup. Would they try to communicate to one another 
differently?  Would they send different types or sizes of messages?  I just 
don't see any use case where a bolt or spout would care, but if you have a use 
case we can totally provide it.

 

I agree that the single worker use case is critical for performance.  Hopefully 
in the future we can get the scheduler and the routers to be smart enough that 
it is less of an issue, but as for now it is needed.

 

> Some topologies not working properly 
> -------------------------------------
>
>                 Key: STORM-2983
>                 URL: https://issues.apache.org/jira/browse/STORM-2983
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Ethan Li
>            Assignee: Ethan Li
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (STORM-2983) Some topologies not working properly

Reply via email to